An Empirical Analysis of Pruning Techniques: Performance, Retrievability and Bias

Chen, R, Azzopardi, L and Scholer, F 2017, 'An Empirical Analysis of Pruning Techniques: Performance, Retrievability and Bias', in CIKM '17 Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6-10 November 2017, pp. 2023-2026.


Document type: Conference Paper
Collection: Conference Papers

Title An Empirical Analysis of Pruning Techniques: Performance, Retrievability and Bias
Author(s) Chen, R
Azzopardi, L
Scholer, F
Year 2017
Conference name ACM Conference on Information and Knowledge Management
Conference location Singapore
Conference dates 6-10 November 2017
Proceedings title CIKM '17 Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
Publisher ACM
Place of publication USA
Start page 2023
End page 2026
Total pages 4
Abstract Prior work on using retrievability measures in the evaluation of information retrieval (IR) systems has laid out the foundations for investigating the relation between retrieval performance and retrieval bias. While various factors influencing retrievability have been examined, showing how the retrieval model may influence bias, no prior work has examined the impact of the index (and how it is optimized) on retrieval bias. Intuitively, how the documents are represented, and what terms they contain, will influence whether they are retrievable or not. In this paper, we investigate how the retrieval bias of a system changes as the inverted index is optimized for efficiency through static index pruning. In our analysis, we consider four pruning methods and examine how they affect performance and bias on the TREC GOV2 Collection. Our results show that the relationship between these factors is varied and complex - and very much dependent on the pruning algorithm. We find that more pruning results in relatively little change or a slight decrease in bias up to a point, and then a dramatic increase. The increase in bias corresponds to a sharp decrease in early precision such as NDCG@10 and is also indicative of a large decrease in MAP. The findings suggest that the impact of pruning algorithms can be quite varied - but retrieval bias could be used to guide the pruning process. Further work is required to determine precisely which documents are most affected and how this impacts upon performance.
Subjects Information Retrieval and Web Search
Copyright notice © 2017 Copyright held by the owner/author(s).
ISBN 9781450349185
Versions
Version Filter Type
Access Statistics: 31 Abstract Views  -  Detailed Statistics
Created: Mon, 04 Dec 2017, 13:07:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us