String matching is one of the most extensively studied problems in computer science, mainly due to its direct applications to such diverse areas as text, image and signal processing, speech analysis and recognition, information retrieval, data compression, computational biology and chemistry. In the last few decades a myriad of alternative solutions have been proposed, based on very different techniques. However, automata have always played a very important role in the design of efficient string matching algorithms. In this paper we introduce the Range Automaton, a weak yet efficient variant of the non-deterministic suffix automaton of a string whose configuration can be encoded in a very simple form and which is particularly suitable to be used for solving text-searching problems. As a first example of its effectiveness we present an efficient string matching algorithm based on the Range Automaton, named Backward Range Automaton Matcher, which turns out to be very fast in many practical cases. Despite our algorithm has a quadratic worst-case time complexity, experimental results show that it obtains in most cases the best running times when compared against the most effective automata based algorithms. In the case of long patterns, the speed-up reaches 250 %. This makes our proposed solution one of the most flexible algorithms in practical cases.

The Range Automaton: An Efficient Approach to Text-Searching

Faro S.;Scafiti S.
2021-01-01

Abstract

String matching is one of the most extensively studied problems in computer science, mainly due to its direct applications to such diverse areas as text, image and signal processing, speech analysis and recognition, information retrieval, data compression, computational biology and chemistry. In the last few decades a myriad of alternative solutions have been proposed, based on very different techniques. However, automata have always played a very important role in the design of efficient string matching algorithms. In this paper we introduce the Range Automaton, a weak yet efficient variant of the non-deterministic suffix automaton of a string whose configuration can be encoded in a very simple form and which is particularly suitable to be used for solving text-searching problems. As a first example of its effectiveness we present an efficient string matching algorithm based on the Range Automaton, named Backward Range Automaton Matcher, which turns out to be very fast in many practical cases. Despite our algorithm has a quadratic worst-case time complexity, experimental results show that it obtains in most cases the best running times when compared against the most effective automata based algorithms. In the case of long patterns, the speed-up reaches 250 %. This makes our proposed solution one of the most flexible algorithms in practical cases.
2021
Automata
Design and analysis of algorithms
Experimental algorithms
String matching
Text processing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/519178
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact