UtterAccess.com
X   Site Message
(Message will auto close in 2 seconds)

Welcome to UtterAccess! Please ( Login   or   Register )

Custom Search
 
   Reply to this topicStart new topic
> Levenshtein Distance Demo, Access 2000    
 
   
doctor9
post Dec 1 2014, 11:00 AM
Post#1


UtterAccess Editor
Posts: 17,924
Joined: 29-March 05
From: Wisconsin


This code was originally posted as a reply to a thread that has since gone missing, so I thought I'd make a more permanent thread that I can refer to.

The basic concept is simple: If you want to compare two strings to see if they're very similar, but not identical, you want to know how different they are. For example, the difference between the names "Gerald" and "Gerry". If the last name is the same, there's probably a good chance that they refer to the same person. This code uses the Levenshtein Distance method to determine the number of changes needed to make to one string turn into another one. In this case, you'd change the "a" in Gerald to an "r", change the "l" to an "r", and the "d" to a "y", so the Levenshtein Distance would be 3. I've attached an Access 2000 demo that shows a simple example of how to find names that are similar to one another using this method.

CODE
'   Levenshtein Distance function algorithm and original VB code found at:
'   http://www.merriampark.com/ld.htm#VB
'   Historic mirror:
'   http://web.archive.org/web/20120113064206/http://www.merriampark.com/ld.htm
'   Written by MICHAEL GILLELAND
'   megilleland -at- gmail.com
'   Minor name changes and comments by Dennis Kuhn

'Example
'This section shows how the Levenshtein distance is computed when the
'source string is "GUMBO" and the target string is "GAMBOL".

'Steps 1 And 2
'    G U M B O
'  0 1 2 3 4 5
'G 1
'A 2
'M 3
'B 4
'O 5
'L 6

'Steps 3 to 6 When i = 1
'    G U M B O
'  0 1 2 3 4 5
'G 1 0
'A 2 1
'M 3 2
'B 4 3
'O 5 4
'L 6 5

'Steps 3 to 6 When i = 2
'    G U M B O
'  0 1 2 3 4 5
'G 1 0 1
'A 2 1 1
'M 3 2 2
'B 4 3 3
'O 5 4 4
'L 6 5 5

'Steps 3 to 6 When i = 3
'    G U M B O
'  0 1 2 3 4 5
'G 1 0 1 2
'A 2 1 1 2
'M 3 2 2 1
'B 4 3 3 2
'O 5 4 4 3
'L 6 5 5 4

'Steps 3 to 6 When i = 4
'    G U M B O
'  0 1 2 3 4 5
'G 1 0 1 2 3
'A 2 1 1 2 3
'M 3 2 2 1 2
'B 4 3 3 2 1
'O 5 4 4 3 2
'L 6 5 5 4 3

'Steps 3 to 6 When i = 5
'    G U M B O
'  0 1 2 3 4 5
'G 1 0 1 2 3 4
'A 2 1 1 2 3 4
'M 3 2 2 1 2 3
'B 4 3 3 2 1 2
'O 5 4 4 3 2 1
'L 6 5 5 4 3 2

'Step 7
'The distance is in the lower right hand corner of the matrix, i.e. 2.
'This corresponds to our intuitive realization that "GUMBO" can be
'transformed into "GAMBOL" by substituting "A" for "U" and inserting an "L"
'(one substitution and 1 insertion = 2 changes).


'********************************
'*** Compute Levenshtein Distance
'********************************
    
Public Function LD(ByVal strText1 As String, ByVal strText2 As String) As Integer

    Dim LM() As Integer ' matrix
    Dim i As Integer ' iterates through strText1
    Dim j As Integer ' iterates through strText2
    Dim intCost As Integer ' cost
  
'   Step 1
'   If one of the inputs is empty, return the length of the other input
    If Len(strText1) = 0 Then
        LD = Len(strText2)
        Exit Function
    End If
    If Len(strText2) = 0 Then
        LD = Len(strText1)
        Exit Function
    End If
'   Set up the matrix of letters
    ReDim LM(0 To Len(strText1), 0 To Len(strText2)) As Integer
  
'   Step 2
'   Populate the matrix of letters
    For i = 0 To Len(strText1)
        LM(i, 0) = i
    Next i
    For j = 0 To Len(strText2)
        LM(0, j) = j
    Next j

'   Step 3
    For i = 1 To Len(strText1)
    
'       Step 4
        For j = 1 To Len(strText2)
      
'           Step 5
            If Mid$(strText1, i, 1) = Mid$(strText2, j, 1) Then
                intCost = 0
            Else
                intCost = 1
            End If
      
'           Step 6
            LM(i, j) = _
                Minimum(LM(i - 1, j) + 1, LM(i, j - 1) + 1, LM(i - 1, j - 1) + intCost)
        Next j
    Next i
  
'   Step 7
    LD = LM(Len(strText1), Len(strText2))
    Erase LM

End Function

'*******************************
'*** Get minimum of three values
'*******************************

Private Function Minimum(ByVal a As Integer, _
                         ByVal b As Integer, _
                         ByVal c As Integer) As Integer
    Minimum = a
    If b < Minimum Then
        Minimum = b
    End If

    If c < Minimum Then
        Minimum = c
    End If
  
End Function


Hope this helps,

Dennis
Attached File(s)
Attached File  LevenshteinDemo.zip ( 24.66K )Number of downloads: 128
 
Go to the top of the page
 


Custom Search
RSSSearch   Top   Lo-Fi    15th December 2017 - 02:43 PM