Revisiting Spam Filtering in Web Search

Gallagher, L, Mackenzie, J and Culpepper, J 2018, 'Revisiting Spam Filtering in Web Search', in Proceedings of the 23rd Annual Australasian Document Computing Symposium, Dunedin, New Zealand, 11-12 December 2018, pp. 1-4.


Document type: Conference Paper
Collection: Conference Papers

Title Revisiting Spam Filtering in Web Search
Author(s) Gallagher, L
Mackenzie, J
Culpepper, J
Year 2018
Conference name ADCS 2018
Conference location Dunedin, New Zealand
Conference dates 11-12 December 2018
Proceedings title Proceedings of the 23rd Annual Australasian Document Computing Symposium
Publisher ACM
Place of publication New York, United States
Start page 1
End page 4
Total pages 4
Abstract The Waterloo spam scores are now a commonly used static document feature in web collections such as ClueWeb. This feature can be used as a post-retrieval filter, as a document prior, or as one of many features in a Learning-to-Rank system. In this work, we highlight the risks associated with using spam scores as a post-retrieval filter, which is now common practice in experiments with the ClueWeb test collection. While it increases the average evaluation score and boosts the performance of some topics, it can significantly harm the performance of others. Through a detailed failure analysis, we show that simple spam filtering is a high risk practice that should be avoided in future work, particularly when working with the ClueWeb12 test collection.
Subjects Information Retrieval and Web Search
Data Structures
DOI - identifier 10.1145/3291992.3291999
Copyright notice © 2018 Copyright held by the owner/author(s).
ISBN 9781450365499
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 0 times in Scopus Article
Altmetric details:
Access Statistics: 12 Abstract Views  -  Detailed Statistics
Created: Tue, 26 Mar 2019, 09:36:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us