Digital Library Research Laboratory TREC-8 Research

In 1999, the DLRL is participating in TREC-8, concentrating on the Ad Hoc track.

Analysis and Retrieval

We are using the MARIAN search engine and analysis tools.

Analysis discussion and schemata (PDF Version).

All four collections are made up of long documents (in the case of FBIS and FR, very long documents). To maximize retrieval specificity, text fields in the documents are being broken into passages corresponding roughly to paragraph-level semantic units. This holds for the <TEXT> tag in all collections, for summaries and supplemental material in FBIS and FR, and for some categories of header material. In addition, tables are all being analyzed and their individual cells treated as separate text entitiles.

The four collections FBIS, FT, FR, and LA will be each loaded as a MARIAN colleciton. The union of the four will be accomplished by a specialy-modified TREC Combiner.

This Combiner will access all four collections, combining similar fields with weighted MAX searchers. For instance, a search with coverTitle coverage will evoke a maximal union of

Each component of this maximal union will be scaled by some appropriate weight. These weights will be determined durin gthe training phase.

Ad Hoc Track Runs

TREC topics will be mapped to MARIAN queries by the simple process of pasting the topic title into one search field and the topic description into another. The coverage(s) for these two fields have yet to be determined. We will work with the training examples to pick some decent coverages.

