With the availability of large amounts of DNA data, exact matching of nucleotide sequences has become an important application in modern computational biology and in meta-genomics. In this paper we present an efficient method based on multiple hashing functions which improves the performance of existing string matching algorithms when used for searching DNA sequences. From our experimental results it turns out that the new proposed technique leads to algorithms which are up to 8 times faster than the best algorithm known for matching multiple patterns. It turns out also that the gain in performances is larger when searching for larger sets. Thus, considering the fact that the number of reads produced by next generation sequencing equipments is ever growing, the new technique serves a good basis for massive multiple long pattern search applications.
Fast searching in biological sequences using multiple hash functions
FARO, SIMONE;
2012-01-01
Abstract
With the availability of large amounts of DNA data, exact matching of nucleotide sequences has become an important application in modern computational biology and in meta-genomics. In this paper we present an efficient method based on multiple hashing functions which improves the performance of existing string matching algorithms when used for searching DNA sequences. From our experimental results it turns out that the new proposed technique leads to algorithms which are up to 8 times faster than the best algorithm known for matching multiple patterns. It turns out also that the gain in performances is larger when searching for larger sets. Thus, considering the fact that the number of reads produced by next generation sequencing equipments is ever growing, the new technique serves a good basis for massive multiple long pattern search applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.