Approximate matching in strings is a fundamental and challenging problem in computer science and in computational biology, and increasingly fast algorithms are highly demanded in many applications including text processing and dna sequence analysis. Recently efficient solutions to specific approximate matching problems on genomic sequences have been designed using a filtering technique, based on the general abelian matching problem, which firstly locates the set of all candidate matching positions and then perform an additional verification test on the collected positions. The abelian pattern matching problem consists in finding all substrings of a text which are permutations of a given pattern. In this paper we present a new class of algorithms based on a new efficient fingerprint computation approach, called Heap-Counting, which turns out to be fast, flexible and easy to be implemented. We prove that, when applied for searching short patterns on a dna sequence, our solutions have a linear worst case time complexity. In addition we present an experimental evaluation which shows that our newly presented algorithms are among the most efficient and flexible solutions in practice for the abelian matching problem in dna sequences.
|Titolo:||Flexible and Efficient Algorithms for Abelian Matching in Genome Sequence|
|Data di pubblicazione:||2019|
|Appare nelle tipologie:||4.1 Contributo in Atti di convegno|