In this note we present three efficient variations of the occurrence heuristic, adopted by many exact string matching algorithms and first introduced in the well-known Boyer-Moore algorithm. Our first heuristic, called improved-occurrence heuristic, is a simple improvement of the rule introduced by Sunday in his Quick-Search algorithm. Our second heuristic, called worst-occurrence heuristic, achieves its speed-up by selecting the relative position which yields the largest average advancement. Finally, our third heuristic, called jumping-occurrence heuristic, uses two characters for computing the next shift. Setting the distance between these two characters optimally allows one to maximize the average advancement. The worst-occurrence and jumping-occurrence heuristics tune their parameters according to the distribution of the characters in the text. Experimental results show that the newly proposed heuristics achieve very good results on average, especially in the case of small alphabets.

Improved and Self-Tuned Occurrence Heuristics

CANTONE, Domenico;FARO, SIMONE
2014-01-01

Abstract

In this note we present three efficient variations of the occurrence heuristic, adopted by many exact string matching algorithms and first introduced in the well-known Boyer-Moore algorithm. Our first heuristic, called improved-occurrence heuristic, is a simple improvement of the rule introduced by Sunday in his Quick-Search algorithm. Our second heuristic, called worst-occurrence heuristic, achieves its speed-up by selecting the relative position which yields the largest average advancement. Finally, our third heuristic, called jumping-occurrence heuristic, uses two characters for computing the next shift. Setting the distance between these two characters optimally allows one to maximize the average advancement. The worst-occurrence and jumping-occurrence heuristics tune their parameters according to the distribution of the characters in the text. Experimental results show that the newly proposed heuristics achieve very good results on average, especially in the case of small alphabets.
2014
Experimental algorithms; Frequency of characters; Occurrence heuristics; String matching; Text-processing, Tuned-search approach
File in questo prodotto:
File Dimensione Formato  
Improved-and-self-tuned-occurrence-heuristics_2014_Journal-of-Discrete-Algorithms.pdf

solo gestori archivio

Tipologia: Versione Editoriale (PDF)
Dimensione 438.67 kB
Formato Adobe PDF
438.67 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/16603
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 4
social impact