Using random sampling to build approximate tries for efficient string sorting

Sinha, R and Zobel, J 2004, 'Using random sampling to build approximate tries for efficient string sorting', in C. C. Ribeiro and S. L. Martins (ed.) Experimental and Efficient Algorithms: Third International Workshop, WEA 2004, Angra dos Reis, Brazil, 20 April 2004.


Document type: Conference Paper
Collection: Conference Papers

Title Using random sampling to build approximate tries for efficient string sorting
Author(s) Sinha, R
Zobel, J
Year 2004
Conference name International Workshop on Experimental and Efficient Algorithms
Conference location Angra dos Reis, Brazil
Conference dates 20 April 2004
Proceedings title Experimental and Efficient Algorithms: Third International Workshop, WEA 2004
Editor(s) C. C. Ribeiro
S. L. Martins
Publisher Springer
Place of publication Berlin
Abstract Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string sorting algorithm that on large sets of strings is almost twice as fast as previous algorithms, primarily because it is more cache-efficient. The approach in burstsort is to dynamically build a small trie that is used to rapidly allocate each string to a bucket. In this paper, we introduce new variants of our algorithm: SR-burstsort, DR-burstsort, and DRL-burstsort. These algorithms use a random sample of the strings to construct an approximation to the trie prior to sorting. Our experimental results with sets of over 30 million strings show that the new variants reduce cache misses further than did the original burstsort, by up to 37%, while simultaneously reducing instruction counts by up to 24%. In pathological cases, even further savings can be obtained.
Subjects Information Systems Management
Keyword(s) string sorting
efficiency
burstsort
DOI - identifier 10.1007/b97914
Copyright notice © Springer-Verlag Berlin Heidelberg 2003
Versions
Version Filter Type
Altmetric details:
Access Statistics: 181 Abstract Views  -  Detailed Statistics
Created: Wed, 08 Apr 2009, 09:42:32 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us