String matching is one of the most extensively studied problems in computer science, mainly due to its direct applications to such diverse areas as text, image and signal processing, speech analysis and recognition, information retrieval, data compression, computational biology and chemistry. In the last few decades a myriad of alternative solutions have been proposed, based on very different techniques. However, automata have always played a very important role in the design of efficient string matching algorithms. In this paper we present the Range Automaton, a weak yet efficient variant of the non-deterministic suffix automaton of a string whose configuration can be encoded in a very simple form and which is particularly suitable to be used for solving a multitude of text-searching problems. We will firstly develop our approach in the case of exact string matching and present an efficient algorithm, named Backward Range Automaton Matcher, which turns out to be very fast in many practical cases. Later, we will show how the Range Automaton can be adapted in an effective way also to non-standard string matching problems such as swap matching and multiple string matching. Experimental results suggest that our approach is flexible and effective for all three search problems addressed, especially in the case of long patterns.

Compact suffix automata representations for searching long patterns

Faro S.;Scafiti S.
2023-01-01

Abstract

String matching is one of the most extensively studied problems in computer science, mainly due to its direct applications to such diverse areas as text, image and signal processing, speech analysis and recognition, information retrieval, data compression, computational biology and chemistry. In the last few decades a myriad of alternative solutions have been proposed, based on very different techniques. However, automata have always played a very important role in the design of efficient string matching algorithms. In this paper we present the Range Automaton, a weak yet efficient variant of the non-deterministic suffix automaton of a string whose configuration can be encoded in a very simple form and which is particularly suitable to be used for solving a multitude of text-searching problems. We will firstly develop our approach in the case of exact string matching and present an efficient algorithm, named Backward Range Automaton Matcher, which turns out to be very fast in many practical cases. Later, we will show how the Range Automaton can be adapted in an effective way also to non-standard string matching problems such as swap matching and multiple string matching. Experimental results suggest that our approach is flexible and effective for all three search problems addressed, especially in the case of long patterns.
File in questo prodotto:
File Dimensione Formato  
Compact suffix automata representations for searching long patterns.pdf

solo gestori archivio

Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 436.63 kB
Formato Adobe PDF
436.63 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/551720
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact