Meta-evaluation of Dynamic Search: How Do Metrics Capture Topical Relevance, Diversity and User Effort?

Albahem, A, Spina, D, Scholer, F and Cavedon, L 2019, 'Meta-evaluation of Dynamic Search: How Do Metrics Capture Topical Relevance, Diversity and User Effort?', in Leif Azzopardi, Benno Stein, Norbert Fuhr, Philip Mayr, Claudia Hauff, Djoerd Hiemstra (ed.) Advances in Information Retrieval 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 1418, 2019, Proceedings, Part I, Cologne, Germany, April 14-18 2019, pp. 607-620.


Document type: Conference Paper
Collection: Conference Papers

Title Meta-evaluation of Dynamic Search: How Do Metrics Capture Topical Relevance, Diversity and User Effort?
Author(s) Albahem, A
Spina, D
Scholer, F
Cavedon, L
Year 2019
Conference name 41st European Conference on Information Retrieval (ECIR)
Conference location Cologne, Germany
Conference dates April 14-18 2019
Proceedings title Advances in Information Retrieval 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 1418, 2019, Proceedings, Part I
Editor(s) Leif Azzopardi, Benno Stein, Norbert Fuhr, Philip Mayr, Claudia Hauff, Djoerd Hiemstra
Publisher Springer
Place of publication Cham, Switzerland
Start page 607
End page 620
Total pages 14
Abstract Complex dynamic search tasks typically involve multi-aspect information needs and repeated interactions with an information retrieval system. Various metrics have been proposed to evaluate dynamic search systems, including the Cube Test, Expected Utility, and Session Discounted Cumulative Gain. While these complex metrics attempt to measure overall system ``goodness'' based on a combination of dimensions -- such as topical relevance, novelty, or user effort -- it remains an open question how well each of the competing evaluation dimensions is reflected in the final score. To investigate this, we adapt two meta-analysis frameworks: the Intuitiveness Test and Metric Unanimity. This study is the first to apply these frameworks to the analysis of dynamic search metrics and also to study how well these two approaches agree with each other. Our analysis shows that the complex metrics differ markedly in the extent to which they reflect these dimensions, and also demonstrates that the behaviors of the metrics change as a session progresses. Finally, our investigation of the two meta-analysis frameworks demonstrates a high level of agreement between the two approaches. Our findings can help to inform the choice and design of appropriate metrics for the evaluation of dynamic search systems.
Subjects Information Retrieval and Web Search
Keyword(s) evaluation
dynamics search
intuitiveness test
metric unanimity
Copyright notice © Springer Nature Switzerland 2019
ISBN 9783030157128
Versions
Version Filter Type
Access Statistics: 14 Abstract Views  -  Detailed Statistics
Created: Thu, 23 May 2019, 08:44:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us