A cluster-based resampling method for pseudo-relevance feedback

Lee, K, Croft, B and Allan, J 2008, 'A cluster-based resampling method for pseudo-relevance feedback', in Sung-Hyon Myaeng, Douglas W. Oard, Fabrizio Sebastiani, Tat-Send Chua and Mun-Kew Leong (ed.) Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), Singapore, 20-24 July 2008, pp. 235-242.


Document type: Conference Paper
Collection: Conference Papers

Title A cluster-based resampling method for pseudo-relevance feedback
Author(s) Lee, K
Croft, B
Allan, J
Year 2008
Conference name 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008)
Conference location Singapore
Conference dates 20-24 July 2008
Proceedings title Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008)
Editor(s) Sung-Hyon Myaeng, Douglas W. Oard, Fabrizio Sebastiani, Tat-Send Chua and Mun-Kew Leong
Publisher ACM
Place of publication New York, USA
Start page 235
End page 242
Total pages 8
Abstract Typical pseudo-relevance feedback methods assume the topretrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a clusterbased resampling method to select better pseudo-relevant documents based on the relevance model. The main idea is to use document clusters to find dominant documents for the initial retrieval set, and to repeatedly feed the documents to emphasize the core topics of a query. Experimental results on large-scale web TREC collections show significant improvements over the relevance model. For justification of the resampling approach, we examine relevance density of feedback documents. A higher relevance density will result in greater retrieval accuracy, ultimately approaching true relevance feedback. The resampling approach shows higher relevance density than the baseline relevance model on all collections, resulting in better retrieval accuracy in pseudo-relevance feedback. This result indicates that the proposed method is effective for pseudo-relevance feedback.
Subjects Information Systems not elsewhere classified
Keyword(s) a cluster-based resampling
Dominant documents
Information retrieval
Pseudo-relevance feedback
Query expansion
DOI - identifier 10.1145/1390334.1390376
Copyright notice © 2008 ACM.
ISBN 9781605581644
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 109 times in Scopus Article | Citations
Altmetric details:
Access Statistics: 377 Abstract Views  -  Detailed Statistics
Created: Tue, 12 Mar 2013, 10:32:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us