Inverted files versus suffix arrays for locating patterns in primary memory

Puglisi, S, Smyth, W and Turpin, A 2006, 'Inverted files versus suffix arrays for locating patterns in primary memory', in F. Crestani, P. Ferragina and M. Sanderson (ed.) Proceedings of the 13th International Conference on String Processing and Information Retrieval, SPIRE 2006, Glasgow, UK, 29 September 2006.


Document type: Conference Paper
Collection: Conference Papers

Title Inverted files versus suffix arrays for locating patterns in primary memory
Author(s) Puglisi, S
Smyth, W
Turpin, A
Year 2006
Conference name International Conference on String Processing and Information Retrieval
Conference location Glasgow, UK
Conference dates 29 September 2006
Proceedings title Proceedings of the 13th International Conference on String Processing and Information Retrieval, SPIRE 2006
Editor(s) F. Crestani
P. Ferragina
M. Sanderson
Publisher Springer
Place of publication USA
Abstract Recent advances in the asymptotic resource costs of pattern matching with compressed suffix arrays are attractive, but a key rival structure, the compressed inverted file, has been dismissed or ignored in papers presenting the new structures. In this paper we examine the resource requirements of compressed suffix array algorithms against compressed inverted file data structures for general pattern matching in genomic and English texts. In both cases, the inverted file indexes q-grams, thus allowing full pattern matching capabilities, rather than simple word based search, making their functionality equivalent to the compressed suffix array structures. When using equivalent memory for the two structures, inverted files are faster at reporting the location of patterns when the number of occurrences of the patterns is high.
Subjects Information and Computing Sciences not elsewhere classified
Keyword(s) pattern matching
suffix arrays
algorithms
DOI - identifier 10.1007/11880561
Copyright notice © Springer-Verlag Berlin Heidelberg 2006
Versions
Version Filter Type
Altmetric details:
Access Statistics: 204 Abstract Views  -  Detailed Statistics
Created: Wed, 08 Apr 2009, 09:42:32 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us