The exact online string matching problem, pivotal in fields ranging from computational biology to data compression, involves identifying all instances of a speci- fied pattern within a text. Despite extensive examination over the decades, this problem has remained computationally challenging due to the time and space limitations inher- ent in traditional online and offline methods, respectively. Introduced in 1991, sampled string matching has now emerged as a groundbreaking approach, ingeniously combin- ing classical online string matching techniques with efficient text sampling methods. This approach not only addresses the spatial constraints of indexed string matching but also significantly reduces the search duration in online environments, achieving speed increases of up to hundreds of times while requiring less than 4% of the text size for its partial index. In this paper, we explore the adaptability of various online string matching algorithms within the framework of sampled string matching, which has tra- ditionally relied on the Horspool algorithm. Our investigation reveals that integrating alternative string matching algorithms as subroutines markedly enhances overall per- formance. These findings highlight the potential for reevaluating established method- ologies in light of newer, more dynamic solutions and set the stage for transformative impacts across multiple domains.
Beyond Horspool: A Comparative Analysis in Sampled Matching
Simone Faro;Francesco Pio Marino;
2024-01-01
Abstract
The exact online string matching problem, pivotal in fields ranging from computational biology to data compression, involves identifying all instances of a speci- fied pattern within a text. Despite extensive examination over the decades, this problem has remained computationally challenging due to the time and space limitations inher- ent in traditional online and offline methods, respectively. Introduced in 1991, sampled string matching has now emerged as a groundbreaking approach, ingeniously combin- ing classical online string matching techniques with efficient text sampling methods. This approach not only addresses the spatial constraints of indexed string matching but also significantly reduces the search duration in online environments, achieving speed increases of up to hundreds of times while requiring less than 4% of the text size for its partial index. In this paper, we explore the adaptability of various online string matching algorithms within the framework of sampled string matching, which has tra- ditionally relied on the Horspool algorithm. Our investigation reveals that integrating alternative string matching algorithms as subroutines markedly enhances overall per- formance. These findings highlight the potential for reevaluating established method- ologies in light of newer, more dynamic solutions and set the stage for transformative impacts across multiple domains.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.