Information retrieval system evaluation: Effort, sensitivity, and reliability

Sanderson, M and Zobel, J 2005, 'Information retrieval system evaluation: Effort, sensitivity, and reliability', in G. Marchionini et al. (ed.) Proceedings of the ACM SIGIR International Conference on Research & Development in Information Retrieval, Salvador, Brazil, 2005, pp. 162-169.


Document type: Conference Paper
Collection: Conference Papers

Title Information retrieval system evaluation: Effort, sensitivity, and reliability
Author(s) Sanderson, M
Zobel, J
Year 2005
Conference name International Conference on Research & Development in Information Retrieval
Conference location Salvador, Brazil
Conference dates 2005
Proceedings title Proceedings of the ACM SIGIR International Conference on Research & Development in Information Retrieval
Editor(s) G. Marchionini et al.
Publisher ACM Press
Place of publication New York, USA
Start page 162
End page 169
Total pages 8
Abstract The effectiveness of information retrieval systems is measured by comparing performance on a common set of queries and documents. Significance tests are often used to evaluate the reliability of such comparisons. Previous work has examined such tests, but produced results with limited application. Other work established an alternative benchmark for significance, but the resulting test was too stringent. In this paper, we revisit the question of how such tests should be used. We find that the t-test is highly reliable (more so than the sign or Wilcoxon test), and is far more reliable than simply showing a large percentage difference in effectiveness measures between IR systems. Our results show that past empirical work on significance tests over-estimated the error of such tests. We also re-consider comparisons between the reliability of precision at rank 10 and mean average precision, arguing that past comparisons did not consider the assessor effort required to compute such measures. This investigation shows that assessor effort would be better spent building test collections with more topics, each assessed in less detail.
Subjects Business Information Management (incl. Records, Knowledge and Information Management, and Intelligence)
Keyword(s) experimental design
search engines
significance testing
system evaluation
Copyright notice © 2005 ACM
ISBN 1-59593-034-5
Versions
Version Filter Type
Access Statistics: 137 Abstract Views  -  Detailed Statistics
Created: Wed, 08 Apr 2009, 09:42:32 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us