A comparative study of probabilistic and language models for information retrieval

Bennett, G, Scholer, F and Uitdenbogerd, A 2008, 'A comparative study of probabilistic and language models for information retrieval', in Database Technologies 2008: Proceedings of the Nineteenth Australasian Database Conference (ADC 2008), Wollongong, Australia, 3-4 December 2007, pp. 65-74.


Document type: Conference Paper
Collection: Conference Papers

Attached Files
Name Description MIMEType Size
n2006009280.pdf Published Version application/pdf 294.39KB
Title A comparative study of probabilistic and language models for information retrieval
Author(s) Bennett, G
Scholer, F
Uitdenbogerd, A
Year 2008
Conference name Nineteenth Australasian Database Conference ADC 2008
Conference location Wollongong, Australia
Conference dates 3-4 December 2007
Proceedings title Database Technologies 2008: Proceedings of the Nineteenth Australasian Database Conference (ADC 2008)
Publisher CRPIT
Place of publication Australia
Start page 65
End page 74
Abstract Language models for information retrieval have received much attention in recent years, with many claims being made about their performance. However, previous studies evaluating the language modelling approach for information retrieval used different query sets and heterogeneous collections, which make reported results difficult to compare. This research is a broad-based study that evaluates language models against a variety of search tasks --- topic finding, named-page finding and topic distillation. The standard Text REtrieval Conference (TREC) methodology is used to compare language models to the probabilistic Okapi BM25 system. Using consistent parameter choices, we compare results of different language models on three different search tasks, multiple query sets and three different text collections. For ad hoc retrieval, the Dirichlet smoothing method was found to be significantly better than Okapi BM25, but for named-page finding Okapi BM25 was more effective than the language modelling methods. Optimal smoothing parameters for each method were found to be dependent on the collection and the query set. For longer queries, the language modelling approaches required more aggressive smoothing but they were found to be more effective than with shorter queries. The choice of smoothing method was also found to have a significant effect on the performance of language models for information retrieval.
Subjects Information Systems Organisation
Keyword(s) Information retrieval
models
retrieval models
Copyright notice © 2008 Australian Computer Society, Inc. Reproduction for academic, not-for-profit purposes permitted provided this text is included.
ISSN 14451336
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 0 times in Scopus Article
Access Statistics: 284 Abstract Views, 110 File Downloads  -  Detailed Statistics
Created: Fri, 09 Oct 2009, 08:09:01 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us