In the July 1988 Issue of “Dr. Dobb’s Journal”, there is an article on Ratcliff/Obershelp pattern matching. This algorithm is designed to compare two strings and return a percentage of how close they are to each other. A result of 67 means that the two strings are 67% alike.
This algorithm can be used to compare a misspelled word with a number of possible correct words. The word with the highest percentage alike is taken to be the correct spelling of the word.
The program is a good example of how to use recursion in SuperBasic. The program takes the two words and finds the greatest common string (GCS) between them. For each string, it takes the part of the word to the left and right of the GCS and does it again with them.
Let’s take a look at the two words “pennsylvania” and “pencilvaneya”. The GCS is “lvan.” The program then takes the left most strings, “pennsy” and “penci”, and processes them. The GCS is now “pen”. Since there is nothing to the left of the GCS, the program then tries the right hand side, “nsy” and “ci.” With these two strings, there is no GCS. The program then goes up the tree and tries the other right hand strings, “ia” and “eya”. This goes on until the whole string is analysed.
Since the program is a function, it return a value to a variable. It is to be used like this:
LET x = percent_alike(a$,b$)
It can take a literal string or a string variable as oan arguement. This is also valid:
LET x = percent_alike("sofware","softwear")
One little caveat about the program. It is case sensitive, so you should convert the two strings to either all upper case, or all lower case.
DEFine FuNction percent_alike(a$,b$)
LOCal total
total = num_alike(a$,b$)
RETURN int( total/(LEN(a$)+LEN(b$))*100)
END DEFine percent_alike
DEFine FuNction num_alike (a$, b$)
LOCal total, temp$, a1, a2, b1, b2, large$
total = 0
IF a$=b$ THEN RETURN LEN(a$)*2
IF LEN(a$)=1 AND LEN(b$)=1 THEN RETURN 0
** make a$ the shortest string
IF LEN(a$) > LEN(b$) THEN
temp$ = a$
a$ = b$
b$ = temp$
ENDIF
** If a$ = one char and b$ > one char
IF LEN(a$)=1 THEN RETURN (a$ INSTR b$)
page 7
large$ = find_gt_com$ (a$, b$)
IF large$ = "" THEN RETURN 0
length = LEN(large$)
total = length*2
a1 = large$ INSTR a$
a2 = a1 + length
b1 = large$ INSTR b$
b2 = b1 + length
IF (a1>1) OR (b1>1) THEN
total = total+num_alike (a$(1 TO (a1-1)), b$(1 TO (b1-1)))
ENDIF
IF (a2<>LEN(a$)+1) OR (b2<>LEN(b$)+1) THEN
total = total+num_alike (a$(a2 TO), b$(b2 TO))
ENDIF
RETURN total
END DEFine percent_alike
DEFine FuNction find_gt_com$ (a$, b$)
LOCAL temp$, i, j, temp, large$
IF LEN(a$) > LEN(b$) THEN
temp$ = a$
a$ = b$
b$ = temp$
ENDIF
LET large$=""
FOR i = 1 TO LEN(a$)
FOR j = i TO LEN(a$)
temp = a$(i TO j) INSTR b$
IF (temp<>0) AND (LEN(a$(i TO j))>LEN(large$)) THEN large$=a$(i
to j)
END FOR j
END FOR i
RETURN large$
END DEFINE find_gt_com