The Levenstein Distance is a measure of how close two strings are to each other. The Levenstein Algoritm is used to calculate the Levenstein Distance.
The Levenstein Algorithm takes two strings and determines what it would take to transform one string into the other, using deletions, additions, and changing characters. The more that must be done, the less alike the two strings are.
In QHJ #1 the Ratcliff/Obershelp algorithm for inexact pattern matching was discussed. This algorithm determines how close two strings are by recursively finding the largest equal substrings in the two strings. The larger substrings found, the closer the two strings are.
This program comes from the May 1991 issue of the C Users Journal. The text accompanying the article explains the program better than I do. If you have questions, you should refer back to the original article.
/*
ldistance() Determine to what extent two charcter
strings are (un)equal using the
'Levenstein distance' algorithm.
*/
#include <stdlib_h>
#include <string_h>
#include <stdio_h>
int addition = 1;
int change = 3;
int deletion = 5;
#define COMP_LEN 20
#define ARR_SIZE COMP_LEN + 1
#define SMALLEST_OF(x,y,z) ( (x<y) ? min(x,z) : min(y,z) )
#define ZERO_IF_EQUAL(x,y) (requested[x-1] == found[y-1] ? 0 : change)
min( a, b)
int a, b;
{
if ( a < b )
return(a);
else
return(b);
}
int distance[ARR_SIZE][ARR_SIZE];
ldistance( requested, found)
char *requested, *found;
{
register int i,j;
int r_len, f_len;
r_len = (strlen(requested)>COMP_LEN ? COMP_LEN : strlen(requested));
f_len = (strlen(found)>COMP_LEN ? COMP_LEN : strlen(found));
distance[0][0] = 0;
for (j = 1; j <= ARR_SIZE; j++)
distance[0][j] = distance[0][j-1] + addition;
for (j = 1; j <= ARR_SIZE; j++)
distance[j][0] = distance[j-1][0] + deletion;
for (i = 1; i <= r_len; i++)
for (j = 1; j <= f_len; j++)
distance[i][j] = SMALLEST_OF(
(distance[i-1][j-1] + ZERO_IF_EQUAL(i,j)),
(distance[i][j-1] + addition),
(distance[i-1][j] + deletion) );
return( distance[r_len][f_len] );
}
int main()
{
int result;
printf("Comparing '%s' and '%s' : \n","pennsylvania",
"pencilvaneya");
result = ldistance("pennsylvania","pencilvaneya");
printf(" Result = %d\n",result);
}