Digital Library Research Laboratory PetaPlex Research

Beginning in 1999, the VT DLRL and Knowledge Systems, Inc. are pursuing a series of collaborations in use of the PetaPlex in Digital Library / Information Retrieval applications.

KSI's home page on the collaborations

This page documents the collaborations from the VT side.

The VT Digital Library Research Laboratory is researching several applications of the PetaPlex line of massive distributed storage devices developed by Knowledge Systems Inc. (KSI)[Akscyn, 1998 #246]. The platform currently installed at Virginia Tech is the VT-PetaPlex-1, a new 2.5 terabyte capacity system with 100 nodes (each with a 233 MHz Pentium processor running Linux and a 25 gigabyte disk). The PetaPlex can be used to store documents and other digital information objects in project archives. It can also be used to store the inverted files used by MARIAN searchers as by many other search engines. Current research is studying the problem of efficient storage and manipulation of very large inverted files in a parallel storage environment. Problems include distribution of data across the parallel storage units, support for the initial inversion process, and support for incremental update to inverted files. Each part will be evaluated using very large (20 gigabyte — 1 terabyte) collections of documents and queries, both live and synthesized.
—  From the CONACyT grant proposal, Jan. 2000

Initial version of documentation for the PetaPlex API can be found at

PetaPlex v. 1, with Rob Akcsyn, company president and information retrieval guru, in the Virginia Tech Computing Center.

Back to DLRL Home Page