|Digital Library Research Laboratory TREC-8 Research|
In 1999, the DLRL is participating in TREC-8, concentrating on the Ad Hoc track.
We are using the MARIAN search engine and analysis tools.
Analysis discussion and schemata (PDF Version).
All four collections are made up of long documents (in the case of FBIS and FR, very long documents). To maximize retrieval specificity, text fields in the documents are being broken into passages corresponding roughly to paragraph-level semantic units. This holds for the <TEXT> tag in all collections, for summaries and supplemental material in FBIS and FR, and for some categories of header material. In addition, tables are all being analyzed and their individual cells treated as separate text entitiles.
The four collections FBIS, FT, FR, and LA will be each loaded as a MARIAN colleciton. The union of the four will be accomplished by a specialy-modified TREC Combiner.
This Combiner will access all four collections, combining similar fields with weighted MAX searchers. For instance, a search with coverTitle coverage will evoke a maximal union of
TREC topics will be mapped to MARIAN queries by the simple process of pasting the topic title into one search field and the topic description into another. The coverage(s) for these two fields have yet to be determined. We will work with the training examples to pick some decent coverages.