Fast on-line index construction by geometric partitioning

Lester, N, Moffat, A and Zobel, J 2005, 'Fast on-line index construction by geometric partitioning', in A. Chowdhury et al. (ed.) Proceedings of the ACM-CIKM International Conference on Information & Knowledge Management, Bremen, Germany, 2005, pp. 776-783.


Document type: Conference Paper
Collection: Conference Papers

Title Fast on-line index construction by geometric partitioning
Author(s) Lester, N
Moffat, A
Zobel, J
Year 2005
Conference name International Conference on Information & Knowledge Management
Conference location Bremen, Germany
Conference dates 2005
Proceedings title Proceedings of the ACM-CIKM International Conference on Information & Knowledge Management
Editor(s) A. Chowdhury et al.
Publisher ACM Press
Place of publication New York, USA
Start page 776
End page 783
Total pages 8
Abstract Inverted index structures are the mainstay of modern text retrieval systems. They can be constructed quickly using off-line merge-based methods, and provide efficient support for a variety of querying modes. In this paper we examine the task of on-line index construction -- that is, how to build an inverted index when the underlying data must be continuously queryable, and the documents must be indexed and available for search as soon they are inserted. When straightforward approaches are used, document insertions become increasingly expensive as the size of the database grows. This paper describes a mechanism based on controlled partitioning that can be adapted to suit different balances of insertion and querying operations, and is faster and scales better than previous methods. Using experiments on 100GB of web data we demonstrate the efficiency of our methods in practice, showing that they dramatically reduce the cost of on-line index construction.
Subjects Business Information Management (incl. Records, Knowledge and Information Management, and Intelligence)
Keyword(s) indexing
information retrieval
search engines
Copyright notice © 2005 ACM
ISBN 1-59593-140-6
Versions
Version Filter Type
Access Statistics: 139 Abstract Views  -  Detailed Statistics
Created: Wed, 08 Apr 2009, 09:42:32 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us