Scalable succinct indexing for large text collections

Petri, M 2013, Scalable succinct indexing for large text collections, Doctor of Philosophy (PhD), Computer Science and Information Technology, RMIT University.

Document type: Thesis
Collection: Theses

Attached Files
Name Description MIMEType Size
Petri.pdf Thesis application/pdf 1.38MB
Title Scalable succinct indexing for large text collections
Author(s) Petri, M
Year 2013
Abstract Self-indexes save space by emulating operations of traditional data structures using basic operations on bitvectors. Succinct text indexes provide full-text search functionality which is traditionally provided by suffix trees and suffix arrays for a given text, while using space equivalent to the compressed representation of the text. Succinct text indexes can therefore provide full-text search functionality over inputs much larger than what is viable using traditional uncompressed suffix-based data structures.

Fields such as Information Retrieval involve the processing of massive text collections. However, the in-memory space requirements of succinct text indexes during construction have hampered their adoption for large text collections. One promising approach to support larger data sets is to avoid constructing the full suffix array by using alternative indexing representations. This thesis focuses on several aspects related to the scalability of text indexes to larger data sets. We identify practical improvements in the core building blocks of all succinct text indexing algorithms, and subsequently improve the index performance on large data sets. We evaluate our findings using several standard text collections and demonstrate: (1) the practical applications of our improved indexing techniques; and (2) that succinct text indexes are a practical alternative to inverted indexes for a variety of top-k ranked document retrieval problems.
Degree Doctor of Philosophy (PhD)
Institution RMIT University
School, Department or Centre Computer Science and Information Technology
Keyword(s) Text indexing
Context-bound text
Version Filter Type
Access Statistics: 220 Abstract Views, 562 File Downloads  -  Detailed Statistics
Created: Fri, 29 Nov 2013, 13:11:08 EST by Denise Paciocco
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us