Sample sizes for query probing in uncooperative distributed information retrieval

Shokouhi, M, Scholer, F and Zobel, J 2006, 'Sample sizes for query probing in uncooperative distributed information retrieval', in X. Zhou, J. Li, K. T. Shen, M. Kitsuregawa and Y. Zhang (ed.) Proceedings of the 8th Asia-Pacific Web Conference (APWeb 2006), Harbin, China, 17 December 2005, pp. 63-75.


Document type: Conference Paper
Collection: Conference Papers

Title Sample sizes for query probing in uncooperative distributed information retrieval
Author(s) Shokouhi, M
Scholer, F
Zobel, J
Year 2006
Conference name Asia-Pacific Web Conference
Conference location Harbin, China
Conference dates 17 December 2005
Proceedings title Proceedings of the 8th Asia-Pacific Web Conference (APWeb 2006)
Editor(s) X. Zhou
J. Li
K. T. Shen
M. Kitsuregawa
Y. Zhang
Publisher Springer
Place of publication Germany
Start page 63
End page 75
Total pages 13
Abstract The goal of distributed information retrieval is to support effective searching over multiple document collections. For efficiency, queries should be routed to only those collections that are likely to contain relevant documents, so it is necessary to first obtain information about the content of the target collections. In an uncooperative environment, query probing - where randomly-chosen queries are used to retrieve a sample of the documents and thus of the lexicon - has been proposed as a technique for estimating statistical term distributions. In this paper we rebut the claim that a sample of 300 documents is sufficient to provide good coverage of collection terms. We propose a novel sampling strategy and experimentally demonstrate that sample size needs to vary from collection to collection, that our methods achieve good coverage based on variable-sized samples, and that we can use the results of a probe to determine when to stop sampling.
Subjects Information and Computing Sciences not elsewhere classified
Keyword(s) information retrieval
query probing
DOI - identifier 10.1007/11610113_7
Copyright notice © Springer-Verlag Berlin Heidelberg 2006
ISBN 978-3-540-31142-3
Versions
Version Filter Type
Altmetric details:
Access Statistics: 188 Abstract Views  -  Detailed Statistics
Created: Wed, 08 Apr 2009, 09:42:32 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us