TY - JOUR

T1 - On-line approximate string matching with bounded errors

AU - Kiwi, Marcos

AU - Navarro, Gonzalo

AU - Telha, Claudio

N1 - Funding Information:
The first author gratefully acknowledges the support of CONICYT via FONDAP-Basal in Applied Mathematics, Anillo en Redes ACT08, and FONDECYT 1090227. The second author was funded in part by Fondecyt Grant 1-080019, Chile. The third author gratefully acknowledges the support of CONICYT via Anillo en Redes ACT08.

PY - 2011/10/21

Y1 - 2011/10/21

N2 - We introduce a new dimension to the widely studied on-line approximate string matching problem, by introducing an error threshold parameter ∈ so that the algorithm is allowed to miss occurrences with probability ∈. This is particularly appropriate for this problem, as approximate searching is used to model many cases where exact answers are not mandatory. We show that the relaxed version of the problem allows us breaking the average-case optimal lower bound of the classical problem, achieving average case O(n log σ m/m) time with any ∈ = poly(k/m), where n is the text size, m the pattern length, k the number of differences for edit distance, and σ the alphabet size. Our experimental results show the practicality of this novel and promising research direction. Finally, we extend the proposed approach to the multiple approximate string matching setting, where the approximate occurrence of r patterns are simultaneously sought. Again, we can break the average-case optimal lower bound of the classical problem, achieving average case O(n logσ (rm)/m) time with any ∈ = poly(k/m).

AB - We introduce a new dimension to the widely studied on-line approximate string matching problem, by introducing an error threshold parameter ∈ so that the algorithm is allowed to miss occurrences with probability ∈. This is particularly appropriate for this problem, as approximate searching is used to model many cases where exact answers are not mandatory. We show that the relaxed version of the problem allows us breaking the average-case optimal lower bound of the classical problem, achieving average case O(n log σ m/m) time with any ∈ = poly(k/m), where n is the text size, m the pattern length, k the number of differences for edit distance, and σ the alphabet size. Our experimental results show the practicality of this novel and promising research direction. Finally, we extend the proposed approach to the multiple approximate string matching setting, where the approximate occurrence of r patterns are simultaneously sought. Again, we can break the average-case optimal lower bound of the classical problem, achieving average case O(n logσ (rm)/m) time with any ∈ = poly(k/m).

KW - Algorithms

KW - String-matching problem

UR - http://www.scopus.com/inward/record.url?scp=84865750171&partnerID=8YFLogxK

U2 - 10.1016/j.tcs.2011.08.005

DO - 10.1016/j.tcs.2011.08.005

M3 - Article

AN - SCOPUS:84865750171

VL - 412

SP - 6359

EP - 6370

JO - Theoretical Computer Science

JF - Theoretical Computer Science

SN - 0304-3975

IS - 45

ER -