Hybrid XML retrieval revisited

Pehcevski, J, Thom, J, Tahaghoghi, S and Vercoustre, A 2005, 'Hybrid XML retrieval revisited', in G Goos, J Hartmanis and J van Leeuwen (ed.) Advances in XML Information Retrieval, Dagstuhl Castle, Germany, 6-8 December 2004, pp. 153-167.


Document type: Conference Paper
Collection: Conference Papers

Title Hybrid XML retrieval revisited
Author(s) Pehcevski, J
Thom, J
Tahaghoghi, S
Vercoustre, A
Year 2005
Conference name International Workshop of the Initiative for the Evaluation of XML Retrieval
Conference location Dagstuhl Castle, Germany
Conference dates 6-8 December 2004
Proceedings title Advances in XML Information Retrieval
Editor(s) G Goos
J Hartmanis
J van Leeuwen
Publisher Springer
Place of publication Berlin
Start page 153
End page 167
Abstract The widespread adoption of XML necessitates structure-aware systems that can effectively retrieve information from XML document collections. This paper reports on the participation of the RMIT group in the INEX 2004 ad hoc track, where we investigate different aspects of the XML retrieval task. Our preliminary analysis of CO and VCAS relevance assessments identifies three XML retrieval scenarios: Original, General and Specific. Further analysis of the relevance assessments under the General retrieval scenario reveals two categories of GO and VCAS topics: Broad and Narrow. We design runs that follow a hybrid XML approach and implement two retrieval heuristics with different levels of overlap among the answer elements. For the Original retrieval scenario we show that the overlap CO runs outperform the non-overlap CO runs, and the WAS run that uses queries with structural constraints and no explicitly specified target element performs best. In both GO and WAS cases, runs that implement the retrieval heuristic that favours less specific over more specific answer elements produce most effective retrieval. Importantly, we present results which show that, for the General retrieval scenario where users prefer less specific and non-overlapping answers to their queries, the choice of using a plain full-text search engine is a very effective choice for XML retrieval.
Subjects Business Information Management (incl. Records, Knowledge and Information Management, and Intelligence)
DOI - identifier 10.1007/11424550_13
Copyright notice © Springer-Verlag Berlin Heidelberg 2005
Versions
Version Filter Type
Altmetric details:
Access Statistics: 136 Abstract Views  -  Detailed Statistics
Created: Wed, 22 Jul 2009, 15:47:23 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us