- All Known Implementing Classes:
- ModifiedBerghelRoachEditDistance, MyersBitParallelEditDistance
public interface GeneralEditDistance
An engine definition for computing string edit distances, and
implementation generators.
Generally, edit distance is the minimum number of edit operations required
to transform one string into another. Being a minimum transform, it
forms a metric space (one that is non-negative, symmetric, and obeys
triangle inequality).
The most common form is Levenshtein distance, where the set of operations
consists of:
deleting one character;
inserting one character; or,
substituting one character for another,
each having a "cost" of 1.
Another common form is Damerau-Levenshtein distance, which adds
transpose two adjacent characters
to the set of operations. Note that many implementations of
Damerau distance do not account for insertion after transposition;
this "restricted" Damerau distance does not obey triangle inequality,
but is substantially easier to compute than the unrestricted metric.
Another variation is to apply differential costs for the various
operations. For example, substitutions could be weighted by
character type (e.g., vowel<=>vowel being a cheaper edit than
vowel<=>consonant).
This class defines an engine for computing edit distance from
a fixed ("pattern") string to other ("target") strings. The pattern
is fixed at instantiation time; this allows an implementation to
perform precomputations to make individual target computations faster.
To allow further performance optimization, the GeneralEditDistance
contract does not require thread-safety. A duplicate method is
defined to allow additional equivalent instances to be created
for separate threads; it is essentially a type-knowledgeable clone.