In this paper we present a method to simulate, using the bit-parallelism technique, the nondeterministic Aho-Corasick automaton and the nondeterministic suffix automaton induced by the trie and by the Directed Acyclic Word Graph for a set of patterns, respectively. When the prefix redundancy is nonnegligible, this method yields-if compared to the original bit-parallel encoding with no prefix factorization-a representation that requires smaller bit-vectors and, correspondingly, less words. In particular, if we restrict to single-word bit-vectors, more patterns can be packed into a word. We also present two simple algorithms, based on such a technique, for searching a set P of patterns in a text T of length n over an alphabet Σ of size σ. Our algorithms, named Log-And and Backward-Log-And, require O((m+σ)⌈m/w⌉)- space, and work in O(n⌈m/w⌉) and O(n⌈m/w⌉ lmin) worst-case searching time, respectively, where w is the number of bits in a computer word, m is the number of states of the automaton, and lmin is the length of the shortest pattern in P.

On the bit-parallel simulation of the nondeterministic Aho-Corasick and suffix automata for a set of patterns

CANTONE, Domenico;FARO, SIMONE;
2012-01-01

Abstract

In this paper we present a method to simulate, using the bit-parallelism technique, the nondeterministic Aho-Corasick automaton and the nondeterministic suffix automaton induced by the trie and by the Directed Acyclic Word Graph for a set of patterns, respectively. When the prefix redundancy is nonnegligible, this method yields-if compared to the original bit-parallel encoding with no prefix factorization-a representation that requires smaller bit-vectors and, correspondingly, less words. In particular, if we restrict to single-word bit-vectors, more patterns can be packed into a word. We also present two simple algorithms, based on such a technique, for searching a set P of patterns in a text T of length n over an alphabet Σ of size σ. Our algorithms, named Log-And and Backward-Log-And, require O((m+σ)⌈m/w⌉)- space, and work in O(n⌈m/w⌉) and O(n⌈m/w⌉ lmin) worst-case searching time, respectively, where w is the number of bits in a computer word, m is the number of states of the automaton, and lmin is the length of the shortest pattern in P.
2012
Automata; Bit-parallelism; Multiple pattern matching; Text processing
File in questo prodotto:
File Dimensione Formato  
2012 CantoneFaroGiaquinta JDA.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Dimensione 240.98 kB
Formato Adobe PDF
240.98 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/12813
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
social impact