Comparing compressed sequences for faster nucleotide BLAST searches

Cameron, M and Williams, H 2007, 'Comparing compressed sequences for faster nucleotide BLAST searches', IEEE - ACM Transactions on Computational Biology and Bioinformatics, vol. 4, pp. 349-364.


Document type: Journal Article
Collection: Journal Articles

Title Comparing compressed sequences for faster nucleotide BLAST searches
Author(s) Cameron, M
Williams, H
Year 2007
Journal name IEEE - ACM Transactions on Computational Biology and Bioinformatics
Volume number 4
Start page 349
End page 364
Total pages 16
Publisher IEEE
Abstract Molecular biologists, geneticists, and other life scientists use the BLAST homology search package as their first step for discovery of information about unknown or poorly annotated genomic sequences. There are two main variants of BLAST: BLASTP for searching protein collections and BLASTN for nucleotide collections. Surprisingly, BLASTN has had very little attention; for example, the algorithms it uses do not follow those described in the 1997 BLAST paper [1] and no exact description has been published. It is important that BLASTN is state-of-the-art: Nucleotide collections such as GenBank dwarf the protein collections in size, they double in size almost yearly, and they take many minutes to search on modern general purpose workstations. This paper proposes significant improvements to the BLASTN algorithms. Each of our schemes is based on compressed bytepacked formats that allow queries and collection sequences to be compared four bases at a time, permitting very fast query evaluation using lookup tables and numeric comparisons. Our most significant innovations are two new, fast gapped alignment schemes that allow accurate sequence alignment without decompression of the collection sequences. Overall, our innovations more than double the speed of BLASTN with no effect on accuracy and have been integrated into our new version of BLAST that is freely available for download from http:// www.fsa-blast.org/.
Keyword(s) Protein Data-Banks
Similarity Searches
Algorithm
Database
Alignment
Retrieval
Copyright notice © 2007 IEEE
ISSN 1545-5963
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 5 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 6 times in Scopus Article | Citations
Access Statistics: 163 Abstract Views  -  Detailed Statistics
Created: Fri, 07 Jan 2011, 09:11:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us