Compression of inverted indexes for fast query evaluation

Scholer, F, Williams, H, Zobel, J and Yiannis, J 2002, 'Compression of inverted indexes for fast query evaluation', in Proceedings of the 25th Annual International ACM-SIGIR Conference, Tampere, Finland, 11-15 August 2002.

Document type: Conference Paper
Collection: Conference Papers

Title Compression of inverted indexes for fast query evaluation
Author(s) Scholer, F
Williams, H
Zobel, J
Yiannis, J
Year 2002
Conference name International ACM-SIGIR Conference
Conference location Tampere, Finland
Conference dates 11-15 August 2002
Proceedings title Proceedings of the 25th Annual International ACM-SIGIR Conference
Publisher Association for Computing Machinery
Place of publication New York, USA
Abstract Compression reduces both the size of indexes and the time needed to evaluate queries. In this paper, we revisit the compression of inverted lists of document postings that store the position and frequency of indexed terms, considering two approaches to improving retrieval efficiency: better implementation and better choice of integer compression schemes. First, we propose several simple optimisations to well-known integer compression schemes, and show experimentally that these lead to significant reductions in time. Second, we explore the impact of choice of compression scheme on retrieval efficiency.In experiments on large collections of data, we show two surprising results: use of simple byte-aligned codes halves the query evaluation time compared to the most compact Golomb-Rice bitwise compression schemes; and, even when an index fits entirely in memory, byte-aligned codes result in faster query evaluation than does an uncompressed index, emphasising that the cost of transferring data from memory to the CPU cache is less for an appropriately compressed index than for an uncompressed index. Moreover, byte-aligned schemes have only a modest space overhead: the most compact schemes result in indexes that are around 10% of the size of the collection, while a byte-aligned scheme is around 13%. We conclude that fast byte-aligned codes should be used to store integers in inverted lists.
Subjects Information Retrieval and Web Search
Copyright notice © 2009 ACM, Inc.
Version Filter Type
Access Statistics: 220 Abstract Views  -  Detailed Statistics
Created: Mon, 19 Oct 2009, 07:39:41 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us