UtterAccess HomeUtterAccess Wiki

Welcome Guest ( Log In | Register )

Custom Search
Edit Discussion
> Normalizing Existing Data    
Normalizing Existing Data

(A demonstration of how to efficiently Normalize existing Non-Normalized data)

Once-off Queries can be a great tool to help you with Normalizing your data. This demonstration will show you how to use them to speed up your Normalization process when you are working with a table that already has data in it.

For this example, we have a table called "tblTestData", which was imported into Access from Excel, and it includes 2,433 records. After a little manual inspection, we can identify a field with repeating data that is ripe for Normalization. The field is called "strSourceName", and it contains around 80 unique values, with a LOT of repeats for almost every unique value.

This sort of data storage works, but it doesn't work well. It is far too easy to have spelling variations for the same source name. By separating the source names into a separate table, we can make data entry more consistent (by using a combobox instead of a textbox), and make data storage more efficient (by storing only a long integer value in each record, rather than a long string of text).

So, we need to make a "Sources" table. Rather than create it from scratch, and scroll through the tblTestData table, looking manually for each unique source name, then copy/pasting each value, we can use a quick-and-dirty Select query that lists all of the unique sources alphabetically:

SQL
SELECT DISTINCT strSourceName
FROM tblTestData ORDER BY strSourceName;

A quick glance at the query results verifies that we have a list of unique sources, listed alphabetically - just like we want them to appear in a table. (This is a great opportunity to visually look over the list; any slight spelling errors will be spotted easily, like "Acme" followed immediately by "Acne". We should fix these errors in the original table before proceeding.) So, next we make a small, but powerful change to the query:

SQL
SELECT DISTINCT strSourceName INTO tblSources
FROM tblTestData ORDER BY strSourceName;

This is now a Make Table query. When we run this, Access will create a new table named "tblSources", and it's contents will look just like the original SELECT query's results.

So, we run the query. If any errors occur, they will appear in a special table that will be created to store the errors. Once the new table is created, it's just a single field, listing each unique source. This is good, but to improve it, we go to the "tblSources" table's Design View, and insert another field called "SourceID", and make it the Primary Key, setting the datatype to Autonumber. After closing and saving the table, we now have a list of Sources that not only is ready to use, but is very easy to update and maintain.

Next, we need to change the original table so it can use this table, rather than manually-entered values. To do this, we add a new field to the "tblTestData" table called "lngSourceID" (lng is Hungarian Notation for "Long Integer"). This new field will store the Long Integer values that refer to the Autonumber field in the "tblSources" table.

Now we have a new field, but it's got Null values for the whole table. To populate it, we'll create an Update query that will insert the proper SourceID value from the Sources table into the lngSourceID field in the tblTestData table:

SQL
UPDATE tblTestData SET tblTestData.lngSourceID = DLookUp("[SourceID]","tblSources","[strSourceName]=""" & [strSourceName] & """");

We run the query, and as before, if any unexpected problems come up while running the Update query, we can check out the newly-created table to see each error that happened, to see if we can rectify things.

Next, we would like to verify that there is a value for EVERY record in the "tblTestData" table, so we open the table in Datasheet view, select a value in the lngSourceID field, and click the A->Z sort button on the ribbon. This will sort any empty fields to the top of the list. Depending on how many errors we may have had (if any), we can either manually type in the missing values (which can sometimes be the simplest solution), or re-write the query and just try again. Perhaps, for example, some of the sources include a quotation mark, which causes the DLookUp function in the query to fail. It's possible to write a more complex DLookUp() statement that eliminates this issue, but again, it may just be simpler to manually enter the value for the few sources with the quotation mark.

Once the lngSourceID field is filled in for every record, we've entered the final stretch of Normalization. We can now go into the Relationships window for the database, and create a simple one-to-many relationship between the "lngSourceID" field in the tblTestData table and the Primary Key "SourceID" field in the tblSources table. Once that's done, we can finally delete the original strSourceName field from the tblTestData table, since it's no longer needed.

That's it! Now, instead of a lot of text, there are Long Integer values instead. Why is this good? Because now our data entry form can be based on a combobox that gets a list of Source choices from the table, thus eliminating many typographical errors. It also saves on storage space for data, because a Long Integer field will take up very little memory/hard drive space compared to a long string of text. Another advantage is that if one of the Sources has its name changed (due to a change in ownership, for example), we would only need to change it once - in the Sources table, and that change would be reflected in forms and reports immediately afterward.

Edit Discussion
Custom Search
Thank you for your support!
This page has been accessed 6,710 times.  This page was last modified 19:30, 8 August 2017 by doctor9. Contributions by Glenn Lloyd and Jack Leach  Disclaimers