In this paper we present a method to simulate, using the bit-parallelism technique, the nondeterministic Aho-Corasick automaton and the nondeterministic suffix automaton induced by the trie and by the Directed Acyclic Word Graph for a set of patterns, respectively. When the prefix redundancy is nonnegligible, this method yields-if compared to the original bit-parallel encoding with no prefix factorization-a representation that requires smaller bit-vectors and, correspondingly, less words. In particular, if we restrict to single-word bit-vectors, more patterns can be packed into a word. We also present two simple algorithms, based on such a technique, for searching a set P of patterns in a text T of length n over an alphabet Σ of size σ. Our algorithms, named Log-And and Backward-Log-And, require O((m+σ)⌈m/w⌉)- space, and work in O(n⌈m/w⌉) and O(n⌈m/w⌉ lmin) worst-case searching time, respectively, where w is the number of bits in a computer word, m is the number of states of the automaton, and lmin is the length of the shortest pattern in P.
|Titolo:||On the bit-parallel simulation of the nondeterministic Aho-Corasick and suffix automata for a set of patterns|
|Autori interni:||CANTONE, Domenico|
|Data di pubblicazione:||2012|
|Rivista:||JOURNAL OF DISCRETE ALGORITHMS|
|Appare nelle tipologie:||1.1 Articolo in rivista|