Using random sampling to build approximate tries for efficient string sorting

Sinha, R and Zobel, J 2006, 'Using random sampling to build approximate tries for efficient string sorting', Journal of Experimental Algorithmics, vol. 10, no. 2.10, pp. 1-18.


Document type: Journal Article
Collection: Journal Articles

Title Using random sampling to build approximate tries for efficient string sorting
Author(s) Sinha, R
Zobel, J
Year 2006
Journal name Journal of Experimental Algorithmics
Volume number 10
Issue number 2.10
Start page 1
End page 18
Total pages 17
Publisher Association for Computing Machinery
Abstract Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string sorting algorithm that on large sets of strings is almost twice as fast as previous algorithms, primarily because it is more cache-efficient. The approach in burstsort is to dynamically build a small trie that is used to rapidly allocate each string to a bucket. In this paper, we introduce new variants of our algorithm: SR-burstsort, DR-burstsort, and DRL-burstsort. These algorithms use a random sample of the strings to construct an approximation to the trie prior to sorting. Our experimental results with sets of over 30 million strings show that the new variants reduce cache misses further than did the original burstsort, by up to 37%, while simultaneously reducing instruction counts by up to 24%. In pathological cases, even further savings can be obtained.
Subject Business Information Management (incl. Records, Knowledge and Information Management, and Intelligence)
DOI - identifier 10.1007/b97914
Copyright notice © Springer-Verlag Berlin Heidelberg 2004
ISSN 1084-6654
Versions
Version Filter Type
Altmetric details:
Access Statistics: 175 Abstract Views  -  Detailed Statistics
Created: Wed, 18 Feb 2009, 09:53:18 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us