The approximate string matching problem consists in finding all locations at which a pattern p of length m matches a substring of a text t of length n, after a finite number of given edit operations. In this paper, we investigate such a problem when the edit operations are translocations of adjacent factors of equal length and inversions of factors. In particular, we first present an O (n m max (α, β))-time and O (m2)-space algorithm, where α and β are respectively the maximum lengths of the factors which can be involved in any translocation and inversion, and show that under the assumptions of equiprobability and independence of characters our algorithm has a O (n logσ m) average time complexity, for an alphabet of size σ. We also present a very fast variant of a recently proposed algorithm for the same problem, based on an efficient filtering method, which has a O (n)-time complexity in the average case, though in the worst case it retains the same O (n m max (α, β))-time complexity.

Text searching allowing for inversions and translocations of factors

CANTONE, Domenico;FARO, SIMONE;
2014-01-01

Abstract

The approximate string matching problem consists in finding all locations at which a pattern p of length m matches a substring of a text t of length n, after a finite number of given edit operations. In this paper, we investigate such a problem when the edit operations are translocations of adjacent factors of equal length and inversions of factors. In particular, we first present an O (n m max (α, β))-time and O (m2)-space algorithm, where α and β are respectively the maximum lengths of the factors which can be involved in any translocation and inversion, and show that under the assumptions of equiprobability and independence of characters our algorithm has a O (n logσ m) average time complexity, for an alphabet of size σ. We also present a very fast variant of a recently proposed algorithm for the same problem, based on an efficient filtering method, which has a O (n)-time complexity in the average case, though in the worst case it retains the same O (n m max (α, β))-time complexity.
2014
Analysis of algorithms; Approximate string matching; Computational biology; Inversions and translocations; Text processing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/14611
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
social impact