The approximate string matching problem consists in finding all locations at which a pattern p of length m matches a substring of a text t of length n, after a finite number of given edit operations. In this paper, we investigate such a problem when the edit operations are translocations of adjacent factors of equal length and inversions of factors. In particular, we first present an O (n m max (α, β))-time and O (m2)-space algorithm, where α and β are respectively the maximum lengths of the factors which can be involved in any translocation and inversion, and show that under the assumptions of equiprobability and independence of characters our algorithm has a O (n logσ m) average time complexity, for an alphabet of size σ. We also present a very fast variant of a recently proposed algorithm for the same problem, based on an efficient filtering method, which has a O (n)-time complexity in the average case, though in the worst case it retains the same O (n m max (α, β))-time complexity.
Text searching allowing for inversions and translocations of factors
CANTONE, Domenico;FARO, SIMONE;
2014-01-01
Abstract
The approximate string matching problem consists in finding all locations at which a pattern p of length m matches a substring of a text t of length n, after a finite number of given edit operations. In this paper, we investigate such a problem when the edit operations are translocations of adjacent factors of equal length and inversions of factors. In particular, we first present an O (n m max (α, β))-time and O (m2)-space algorithm, where α and β are respectively the maximum lengths of the factors which can be involved in any translocation and inversion, and show that under the assumptions of equiprobability and independence of characters our algorithm has a O (n logσ m) average time complexity, for an alphabet of size σ. We also present a very fast variant of a recently proposed algorithm for the same problem, based on an efficient filtering method, which has a O (n)-time complexity in the average case, though in the worst case it retains the same O (n m max (α, β))-time complexity.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.