Printable Version of Topic

Click here to view this topic in its original format

UtterAccess Forums _ Access Searching + Data Mining _ Standardizing Street Addresses

Posted by: argeedblu Jul 24 2014, 11:07 AM

In my current project, we want to use street address field(s) to group people for follow up contact. The assumption is that people who share a common street address may be members of the same household. Initial information comes from a 3rd party and is not in any particular order. Two people who are potentially related by street address my by many entries apart in the source data. For this to work, it is essential to test potential matches at data entry time. I'm looking for thoughts on possible strategies for testing for potential relationships to assist with maintaining a many-many relationship between people and street addresses.
otential source data might look like this:

Any and all thoughts most welcome.

Posted by: pere_de_chipstick Jul 24 2014, 12:09 PM

Hi Glenn
checked if specific address spaces matched, ((Address1 OR Area) AND City AND State) and a pop up form could be opened showing all addresses that were similar if there were any matches. The user could then select an existing address, or save the new address. All addresses were held in an address table and linked to individuals in a join table.
This would catch occurrences of say "1234 Any Street" and "1234 Any St." because of the match on the remaining fields.

Posted by: RAZMaddaz Jul 24 2014, 12:53 PM

Take a look at the Query I created and see if this might be what you are looking to do.
RAZMaddaz ( 19.7K ): 33

Posted by: cheekybuddha Jul 24 2014, 01:00 PM

One option is to have a list of abbreviations/full versions that you replace with an update query - but how do you it? You may think to update all instances of 'St' to 'Street'. But what happens with 'St James' St'?
Perhaps a better option would be to use Soundex for initial matching?

Posted by: nvogel Jul 24 2014, 03:40 PM

Hi Glenn, you may be aware that there are some software packages you can buy specifically designed for address matching and householding. There are also services provided by companies who will do it for you. Maybe these could be alternative options for you.

Posted by: orange999 Jul 24 2014, 04:29 PM

This a lot of info they consider standard. You may get some ideas for reconciling various "user terms- street str. st, blvd etc) to a standard for your incoming addresses. As to whether people with same address are members of same household may be true, but not necessarily -- How would you know for sure?
HAs I recall (and I often recall wrongly) you were discussing a fuzzy matching project (soundex/levenshtein distance/etc) a while back. Is this part of the same thing, or does it apply here?
Good luck.

Posted by: xordevoreaux Jul 25 2014, 06:49 AM

I have the same issue - attempting to identify who lives where. I am able to separate unit from the apartment as well, which really helps, and I'm happy with the results when done, but it took me several queries to do it, and run the code. I have the original text fields for the addresses and a place to store the results, rather than ever changing the originals.

            .ExecuteCN "Qry_Import_ActiveEmployees_1000"        'Updates ConvertedStreet1 and ConvertedStreet2 with Street1, Street2
            .ExecuteCN "Qry_Import_ActiveEmployees_1010"        'Place, Circle, Drive, Street, Avenue, Road
            .ExecuteCN "Qry_Import_ActiveEmployees_1011"        'Terrace, Lane, Highway
            .ExecuteCN "Qry_Import_ActiveEmployees_1012"        'Court, Square, Parkway
            .ExecuteCN "Qry_Import_ActiveEmployees_1100"        'North, South, East, West
            .ExecuteCN "Qry_Import_ActiveEmployees_1110"        'Northeast, Southeast, Northwest, Southwest
            .ExecuteCN "Qry_Import_ActiveEmployees_1120"        'Removes . from ConvertedStreet1
            .ExecuteCN "Qry_Import_ActiveEmployees_1210"        'Moves anything with "Apt" from ConvertedStreet1 to ConvertedStreet2
            .ExecuteCN "Qry_Import_ActiveEmployees_1300"        'Removes #, -, ., Apt from ConvertedStreet2
            .ExecuteCN "Qry_Import_ActiveEmployees_1310"        'Removes Lot, Bldg, Building, Unit, Trlr from ConvertedStreet2
            .ExecuteCN "Qry_Import_ActiveEmployees_1320"        'Removes spaces from ConvertedStreet2

This is the SQL for _1010, which covers what _1011 through 1120, where ConvertedStreet1 is the first of two address fields for the street (the database has the first street address for the building/house, the second for the unit/apt number).
UPDATE tblEmployee SET tblEmployee.ConvertedStreet1 = IIf(IsNull([ConvertedStreet1]),"",Replace(Replace(Replace(Replace(Replace(Replace([ConvertedStreet1],"Place","Pl"),"Circle","Cir"),"Drive","Dr"),"Street","St"),"Avenue","Ave"),"Road","Rd"));
Sometimes, information didn't come to us with the house/building unit in the second street address field, but as part of the first.
In _1200, _1210, I'm looking for "Apt" and wherever I find it, taking the back half of the street address and moving it to street address 2.
In _1300 to 1320, I then remove stray [censored] from the unit to make them consistent: B instead of #B or # B:
UPDATE tblEmployee SET tblEmployee.ConvertedStreet2 = Replace(Replace(Replace(Replace(Replace(Replace([ConvertedStreet2],"-",""),"# ",""),"#",""),"Apt ",""),"Apt",""),".","");
UPDATE tblEmployee SET tblEmployee.ConvertedStreet2 = Replace(Replace(Replace(Replace(Replace([ConvertedStreet2],"Lot",""),"Bldg ",""),"Building",""),"Unit",""),"Trlr","");
UPDATE tblEmployee SET tblEmployee.ConvertedStreet2 = Replace([ConvertedStreet2]," ","");
Works well the majority of the time.

Posted by: argeedblu Jul 26 2014, 08:14 AM

My apologies if this seems to be somewhat of an ominbus response that is also a bit late.
Bernie Thanks for you popup suggestion. For identifying similar address1's, I wonder if it is safe to assume that the characters that preceed the first space will always be what I would call the civic address. That is, are there any regions that might put the street name first followed by a number of alphanumeric value that identifies the specific building or house. I may be overthinking the problem and maybe should just allow such exceptions, if they exist, to fall through the trap.
@Raz Your query definitely looks to be along the right lines. I tested it against an address like "1234 St. Charles St." which it correctly calculated as "1234 St. Charles Street" - I'm going to be doing this testing at input time but I can implement your calculations in my algorithm. Thanks.
@David Thanks for your comments. Soundex could definitely be part of the solution but, so far, I have found it to be not quite up to the job.
@nvogel I will have to investigage third party packages. We really want something that is integrated into the application so that the checking/correction can occur when the data is entered. Time is of the essence in the final application. Lists of names have to be quickly entered and passed along to the next step in the workflow as quickly as possible.
@Orange Thanks for the Canada Post link. That will be invaluable background information to guide my delvelopment. It's true that people who live at the same address may not be part of the same household. Some, of course, would be distinguished by the value of addr2 but, of course, not all. I think the solution will be to let the user make the final determination. So the application would present the user with a list of potential household member and indicate which actually belong to the household. You are correct about my earlier fuzzy matching discussion. That was in the context of distinguishing/matching personal names. This discussion is in the context of a different project/application where we have to identify people in a list that we know to represent unique individuals who might belong to the same household. My initial exploration at fuzzy matching addresses doesn't seem to adequately handle the problem.
@xordevoreaux Your series of queries (apologies for the unintentional rhyme) looks promising. I will be implementing my final solution with a VBA function so that input addresses can be tested against existing addresses when the information is intially key. My client advises that the initial lists they have to work with are seldom electronic. Incidentally I like your naming convention for sequential queries. I have never been able to devise one for myself so I will be "borrowing" yours. I am also interested in the times that it has not worked for you. What kinds of problems have you run into?
Thanks everyone who has contributed to this thread. You have given me lots of food for thought and a definite push in the right direction.

Posted by: xordevoreaux Jul 28 2014, 06:48 AM

1. Feel free to borrow the naming convention, I'm glad that it could be of help.
. The times that the routine has not worked for me is if there is a unit type that the routine wasn't expecting.
For example, 123 Main Street Apt. B, it could handle, but then we got into a part of the country where we had trailer numbers in a trailer park, so I had to re-write the routine to accommodate a unit for trailer, trlr, etc.
I'd have to re-write it again if suddenly we saw Condo #C, or something like that.

Posted by: Jeff B. Jul 28 2014, 06:54 AM

OPO (just one person's opinion) ... I wonder if this is one of those situations that calls for USB technology (using someone's brain)?
I'd probably locate possible matches and have a user decide/determine "most likely" matches.
Best of luck!

Posted by: argeedblu Jul 28 2014, 07:51 AM

Thanks xordev.

Posted by: orange999 Jul 28 2014, 08:03 AM

I agree with Jeff B here. This is not likely to be done from a single design. From experience with things like standardizing Job Positions, addresses, company names, contact areas of responsibility..... You get your best approach results, then review and refine the process/validation and repeat. Rarely will the first attempt be 100% of what you need. I don't know how you will implement, but I don't think letting user decide is necessarily best choice. You may want to flag such entries such that you could review them specifically and edit as necessary.

Posted by: argeedblu Jul 28 2014, 08:08 AM

Thanks Orange,
I am composing a reply to Jeff at the moment in which I am describing the strategy I have come up with. Some of the information in the Canada Post addressing guide to which you referred me has been very helpful in the process.

Posted by: argeedblu Jul 28 2014, 08:21 AM

Thanks Jeff, it certainly seems that way.
long those lines, I have come up with an alternative strategy that I hope will work out. Because most of the addresses in this application will represent a single city, I am going to fully normalize the street address (address line 1). Locally there are about 2000 street names. When I was researching street names, I came across a partial list and soon realized that in everyday life, people are somewhat less than precise when it comes to street names. "I live on xxxx or at 235 xxx." without qualifying whether the street name includes "Street," "Avenue," "Crescent," or whatever. The list I found on-line included several variations for some streets where in fact the particular street name did not have any of the variations. Being able to accurately match people who share a common address is essential for the application. Hence my quest for accuracy.
So my strategy will be to present the user with a combo that lists known name-street type-street direction combinations. We will build the list as we go using the combo's not in list event to populate foreign keys in the address record for name, street type, and street direction. In some cases street type and street direction might be nulls. The not in list event handler will present the user with a form asking if they want to add the stree (eg, 5th Street North) to the street list. If they do, the will manually decompose the address using three combos, one for name (listing all names currently on file), one for type (listing type that haven't previously been paired with that name, and one for direction (listing directions that haven't previously been paired with that name-type combination. The street type table is populated with the English form of street types and corresponding abbreviations listed in the Canada Post addressing guide. The directions table is similarly populated with the English form street directions and abbreviations from the same guide. While it would be preferable to offer variations in both official languages, I don't want to tackle the extra work to do so in order not to delay other project work.

Posted by: dmhzx Jul 28 2014, 08:28 AM

This is all done with a hint of tongue in cheek!
Over here in the UK we have a system called 'Post Codes'
Combining that with the house number (or name) is effectively a standard address, and all the other bits and pieces that may be typed in manually and differently can be 'harmonised' using a look up.
Most of our sat nav is done by post codes. - And we can use Google maps with post codes
Odon't know if your Canadian post code process is as well 'ingrained' or anywhere near as granular. -- But wondered if you'd considered the possibility?
Some years ago when we had a need to de-duplicate many addresses, I built a system that combined some automation, and the previously mentioned 'USB' approach in competition with
someone else very much into a system only approach
Yis all done with a hint of tongue in cheek!
Over here in the UK we have a system called 'Post Codes'
Combining that with the house number (or name) is effectively a standard address, and all the other bits and pieces that may be typed in manually and differently can be 'harmonised' using a look up.
Most of our sat nav is done by post codes. - And we can use Google maps with post codes
Odon't know if your Canadian post code process is as well 'ingrained' or anywhere near as granular. -- But wondered if you'd considered the possibility?
Some years ago when we had a need to de-duplicate many addresses, I built a system that combined some automation, and the previously mentioned 'USB' approach in competition with
someone else very much into a system only approach
YOur address book was cleaned and finished before the 'opposition' had finished system testing.

Posted by: argeedblu Jul 28 2014, 08:36 AM

Thanks for the thought. However Canadian postal codes while fully consistent in format, nation-wide, are not quite as granular as you suggest. The last three characters represent a 'walk,' and can thus include a number of residences that are in close proximity to each other. They are presumably served by the same letter carrier. Hmmm, that brings to mind a recent announcement from Canada Post that they will be phasing out residential delivery over the next few years in favour of centralized community mail drops. Most likely that means that one mail drop will cover any number of the former 'walks.'

Posted by: dmhzx Jul 28 2014, 09:03 AM

For goodness sake Glenn
Don't ever post something like that again.
If our newly privatised Royal Mail gets wind of that, we'll all have to go personally to the nearest sorting office to where the mail was posted to collect our mail, with five photographic
proofs of ID, two utility bills and a
blood sample, and the Data protection act will 'prevent' them from giving us the mail for the rest of the family.--- Or of telling us over the phone which sorting office to go to.
In our case the post code could be up to about twenty or so houses, but will always be in the same street.
Best of luck with your project: I think you may need it!

Posted by: ZapDude Aug 4 2014, 12:46 PM

I have an automated street address parsing system that is successful about 98% of the time. Specifically, it breaks up a full street name (or, more typically, an "Address1" and an "Address2" field) into the following: Street Number, Street Direction, Street Name, Street Designation, Unit Designation, Unit Number, City, State, and Zip Code. My address schema only applies to US-based addresses, but such a process can probably be created for other countries.
For instance, the address block as follows:
123 N. Main St.
Suite 45A
Centerville, CA 92345-6789
...would break down to this:
Street Number: 123
Street Direction: N
Street Name: Main
Street Designation: Str
Unit Designation: Ste
Unit Number: 45A
City: Centerville
State: CA
Zip: 923456789
I use a VBA process that parses all the information based upon rules I've set up. For instance, I use the USPS official designations for the database. The USPS database also includes typical misspellings of street types. The only thing I can't readily check is actual spelling of street and city names, as well as occasional street directions from street names (for instance, "North Dakota St." would actually be a street name of "North Dakota", and not a street name of "Dakota" with "North" and "South" designations, ie "123 N. Dakota St" as opposed to "123 North Dakota St").
In my case, since I can automate most of the process, it's a relatively simple matter to adjust the ~2% out of compliance with the protocols.
I use text designations for everything but zip code for the US. (I actually normalize further; for instance I have a table for all US cities and states, then use a many/many join table. For instance, I only have one "Orange" in the city table, but use the many /many to assign it to the cities of Orange in Texas, California, New Jersey, etc._ and use foreign key combinations to make a unique designation).
While this may sound a bit of an overkill, I use this schema in all my database development projects, along with the name parsing function I created in an earlier tutorial. If you wish, I can create a tutorial for the process of parsing addresses, though it may take awhile.

Posted by: argeedblu Aug 4 2014, 02:39 PM

Thank you very much for your insight. Since posting my original question, I have arrived at a schema similar to what you are suggesting. I am using a combination soundex/levenstein distance variation to detect and alert the user to possible matches against existing names.
In Canada Post standards, direction follows the street designation. I think that will make it a bit easier to deal with the North Dakota type of situation. To deal with non-Canadian addresses, I have a 1-1 table for 'Other' postal codes.
Your parsing tutorial would be a useful an excellent article in the Access Wiki. I would certainly be interested in reading it.

Posted by: orange999 Aug 4 2014, 02:53 PM

Just to agree 100% with Glenn, your tutorials and code on name parsing and address parsing would seem ideal candidates for the Access Wiki or code archive.

Posted by: ZapDude May 15 2018, 12:22 PM

I apologize for not following up with this project. I had actually had to do a major work project, and forgot about creating this function as I have not really had a need for it as much as other protocols. In fact, I'm in the middle of something that would preclude from starting now, but will get to it by the middle of June 2018.