UtterAccess HomeUtterAccess Wiki

Welcome Guest ( Log In | Register )

Custom Search
Edit Discussion
> Imported Data    
Imported Data

Imported Data

Imported Data, particularly from legacy systems is not always pure. For example, there may be trailing or leading spaces, or the data may contain non-printable characters or other special characters.

This leads to issues when trying to use lookups and matches. Two strings that look the same don’t match up.

The first test is to see if the strings are truly different. Copy one into Cell A1 on a blank sheet and another in Cell B1. Then in Cell C1 use the formula =A1=B1. If the strings are the same, then the result will be TRUE and the problem exists somewhere else.

If the strings are different then the result is FALSE and we can proceed with troubleshooting.

In Cell A2 use the formula =LEN(A1) and in B2 use =LEN(B1). If the cells have a different number of characters, this is usually an indication of a space attached to one of the strings. If this is the case, try =TRIM(A1) = TRIM(B1). Trim takes off leading and trailing spaces and any multiple spaces within the string are reduced to one space.

If the strings are different lengths and TRIM doesn’t fix it, then there is a chance of a hidden, non-printable character in the string. Try =CLEAN(TRIM(A1)) = CLEAN(TRIM(B1)). CLEAN supposedly removes non-printable characters.

When all else fails find out what is really happening by using the utility at the end of this wiki. It converts each letter in the string into its ASCII equivalent. For example, uppercase A is 65. Space is 32, but ASCII Code 160 looks like and prints exactly like a space, however a string containing Code 160 will not match a string with Code 32 in the same position.

Once you figure out that you have bad data there are several ways of dealing with it. If it is a “one off,” then cleaning the data in a helper column, copying it and pasting it back in as values is probably the best thing to do. Use CLEAN and TRIM to remove non-printable characters and extra spaces.

To replace rogue characters like Code 160 use the following, =SUBSTITUTE(A1, CHAR(160), “ “).

Otherwise, you can roll these cleansing agents into the formulas. It’s fairly easy if the offending strings are in the column you use to do the lookup for example =VLOOKUP(TRIM(A2),Sheet2!$A$1:$C$100,3)

If the offending strings are in the range looked at, then it’s a similar formula: =VLOOKUP(A2,TRIM(Sheet2!$A$1:$C$100),3) except this has to be entered as an array formula.

I call the following UDF, ASCString and it’s usage is =ASCString(CellRef). The output looks something like the following. Cell A1 has the string “Hello World”. ASCString (A1) yields: 72-101-108-108-111-32-87-111-114-108-100.

CODE

Option Base 0
Option Explicit

Function ASCString(Instring As String) As String
Dim MyString() As Byte
Dim TempString As String, TempLen As Long
Dim k As Integer, MyCol As Integer

TempString = ""
TempLen = Len(Instring)
ReDim MyString(TempLen)

MyString = StrConv(Instring, vbFromUnicode)

For k = 0 To UBound(MyString)
   TempString = TempString & (MyString(k)) & "-"
Next

ASCString = Left(TempString, Len(TempString) - 1)
End Function
Edit Discussion
Custom Search


Thank you for your support!
This page has been accessed 2,563 times.  This page was last modified 20:29, 22 October 2014 by dflak.   Disclaimers