In this paper we present a method to simulate, using the bit-parallelism technique, the nondeterministic Aho-Corasick automaton and the nondeterministic suffix automaton induced by the trie and by the Directed Acyclic Word Graph for a set of patterns, respectively. When the prefix redundancy is nonnegligible, this method yields-if compared to the original bit-parallel encoding with no prefix factorization-a representation that requires smaller bit-vectors and, correspondingly, less words. In particular, if we restrict to single-word bit-vectors, more patterns can be packed into a word. We also present two simple algorithms, based on such a technique, for searching a set P of patterns in a text T of length n over an alphabet Σ of size σ. Our algorithms, named Log-And and Backward-Log-And, require O((m+σ)⌈m/w⌉)- space, and work in O(n⌈m/w⌉) and O(n⌈m/w⌉ lmin) worst-case searching time, respectively, where w is the number of bits in a computer word, m is the number of states of the automaton, and lmin is the length of the shortest pattern in P.
On the bit-parallel simulation of the nondeterministic Aho-Corasick and suffix automata for a set of patterns
CANTONE, Domenico;FARO, SIMONE;
2012-01-01
Abstract
In this paper we present a method to simulate, using the bit-parallelism technique, the nondeterministic Aho-Corasick automaton and the nondeterministic suffix automaton induced by the trie and by the Directed Acyclic Word Graph for a set of patterns, respectively. When the prefix redundancy is nonnegligible, this method yields-if compared to the original bit-parallel encoding with no prefix factorization-a representation that requires smaller bit-vectors and, correspondingly, less words. In particular, if we restrict to single-word bit-vectors, more patterns can be packed into a word. We also present two simple algorithms, based on such a technique, for searching a set P of patterns in a text T of length n over an alphabet Σ of size σ. Our algorithms, named Log-And and Backward-Log-And, require O((m+σ)⌈m/w⌉)- space, and work in O(n⌈m/w⌉) and O(n⌈m/w⌉ lmin) worst-case searching time, respectively, where w is the number of bits in a computer word, m is the number of states of the automaton, and lmin is the length of the shortest pattern in P.File | Dimensione | Formato | |
---|---|---|---|
2012 CantoneFaroGiaquinta JDA.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Dimensione
240.98 kB
Formato
Adobe PDF
|
240.98 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.