UtterAccess HomeUtterAccess Wiki

Welcome Guest ( Log In | Register )

Custom Search
Edit Discussion
> Normalization    
Normalization

Image:NotifCleanup.gif This page requires general cleanup in formatting or text to better fit the UA Wiki Guidelines
This page is under consideration for merging with: Normalization Terms and Concepts, Normalizing A Database

Normalization is a systematic approach to creating tables and defining relationships in a database.

Normalization helps you accomplish two goals:

  • Eliminate insertion, update and delete Anomalies in order to protect the integrity of your data.
  • Eliminate redundancy and ambiguity in your data.

Another way to state the first goal is that your tables and relationships must be defined so it is not possible to add, change or delete data if doing so would result in ambiguous, inaccurate, or unrelated values.

A common saying that addresses the second goal is to state that “each piece of data must be stored one time, in one place.”

When you are planning your database, instead of focusing on how you want to get the data out, think about what you have to put in and structure it well. Reports should not drive how you structure your tables, nor should the format of data that you import.

Your table structure is driven by how things actually relate in the real world. Looking at what you have from another perspective to see that can often be difficult. The best solution is simple... but it is often the hardest to find.

Drawing an analogy to a building:

  • Data structure is the foundation.
  • Forms are walls.
  • Reports are windows and skylights, since they allow you to view your data.

Organize your data so the structures are strong. You want what you build on that foundation to be stable and flexible.

Structuring data is an iterative process. Set up tables, create relationships between tables and print relationship diagram, change tables, print relationship diagram, put data in, change tables, print relationship diagram, …

It takes thought and foresight to design your structures well. And the simplest solution is rarely easy to see. Get the structure right before building anything else. If you have already built forms, queries, or other objects, fix the structure before building any more!

Just as you wouldn't build a house on sand without a strong foundation, your data structure is the foundation you will build on for Access. A good design is everything when it comes to Access. Structuring data is the most important thing to do well -- as you learn and your knowledge grows, it is important to translate that to data structure changes -- just as you would repair a crack in the foundation of a building the moment it is discovered. It takes time to plan your database. Here are some ideas for you:

Think of the different "nouns" that you will track:

  • Customers
  • Addresses
  • Phones
  • Email Addresses
  • Notes
  • Products
  • Purchases
  • Payments
  • Campaigns

For each "noun", describe it with "adjectives", or fields.

Each table should have a primary key. It is common to use an AutoNumber field and "ID" in the name of a primary key, such as CustomerID, ProductID, etc. It is a good idea to qualify the ID fieldname as opposed to using "ID" as a fieldname all by itself as that is ambiguous and not descriptive.

If you use an autonumber field for the primary key in your table, then use a Long Integer field as the foreign key in related tables.

Contents

Don't repeat data structures

If you see, for instance, that you have Department information in more than one table, create a Departments table with an AutoNumber primary key called something like DeptID and then make a Long Integer DeptID in other tables to relate the data.

Key Fields

Key fields are used to link related tables together. The key field names in the different tables can be the same since they represent the same piece of information. Otherwise, it is a good idea for each field in a database to have a unique name.

Data Type for Key Fields

As a general rule, Autonumbers and Long Integers are most efficient for key fields. Double precision and single- precision numbers are not accurate for keys because of the inherent problems with exact comparison. If you do use a text field for a key, keep in mind that it should be short to be efficient. It takes about 1 byte to store each character whereas long integers only require 4 bytes to store.

Starting with JET 4 (Access 2000), each text character takes 2 bytes to store in order to support Unicode. This means that a 50-character text field takes 100 bytes to store – 25 times the space needed for a Long Integer.

Default Value

Change default value of all numeric foreign key fields to Null -- the default for all numbers is 0 (versions below Access 2007), which will never match with a sequential AutoNumber field -- not changing it will prevent you from being able to enforce referential integrity if it is not specified. If referential integrity is enforced, as it should be in most relationships (unless you have a good reason not to), a default value of 0 in a numeric foreign key field can cause other problems, such as append queries that fail.

To hammer that in...Access sets the DefaultValue of numeric fields to 0 -- this is not good for a foreign key because there won't be a key field to match it to : It is okay if it is not filled out, but it is NOT okay if it is filled out with a value that doesn't match.

Indexes

You can also create indexes on fields in tables. For instance, you may have a combination of fields that should be unique, such as TestID and QuestionID, where you would not want the same question to appear on a test more than once. You can create a unique index on the combination. Think of which fields you will be doing lookups on and build indexes, but do this judiciously as indexes are updated when records are changed, so they take extra time to maintain, and there is a limit to how many you can create. If you use dLookup, it will work faster on an indexed field.

Here is a link you may wish to read as well: Use Microsoft Jet's ShowPlan to write more efficient queries, by Susan Sales Harkins and Mike Gunderloy http://articles.techrepublic.com.com/5100-22-5064388.html

Naming

Naming is a very personal thing and it is rare you will find people agreeing on the best way to do it. There are guidelines you can follow, or just develop your own logic.

Don't use anything but letters, numbers, and underscores in names (fieldnames, tablenames, control names, etc). If you follow this, then you will not have to surround your names with brackets when you type code.

Don't use special characters in names (%, &, /, etc). Most programmers do not even use spaces. Start all names with a letter not a number. Using numbers usually indicates that the data is not normalized anyway.

If you start a fieldname with a number, you WILL have problems at some point and you will probably pull your hair out trying to figure out what is wrong, so don't ever do it.

Think about how long text fields will be and set the Field size to something other than the default of 50 characters. For instance, for Americans, 30 is usually long enough for cities, 10 long enough for zips, 14 for phone numbers, 20 or 30 for last or first name. These are just guidelines, your data may need more space.

Keep names concise yet descriptive.

Name your fields and tables well. When you do create forms and reports, name the controls to something logical before building any code. Names like Text34 and Combo68 make it frustrating to code and even more so if you need help and others have to decipher what those names really mean. Some like to make the Name property of the control the same as the ControlSource property whenever possible.

When you have command buttons, name them logically like cmd_Close, cmd_OpenReportMenu, etc.

"Date" is a bad name for a field since that is a reserved word. Qualify fieldnames, like CourseDate and SaleDate. One reason to keep fieldnames short is consideration of the length of the field that the column heading will be over since column headings can't wrap when you open a table. Another is less to type when you are coding and … last but certainly not least – you can fit more on a relationship diagram.

Reserved Words

In addition to naming your fields (tables, queries, etc) well, you also need to avoid using special words that Access might interpret to mean something different. A list of reserved words, and a free utility to scan your tables, can be found on Allen Browne's website:

Problem names and reserved words in Access, by Allen Browne http://www.allenbrowne.com/AppIssueBadWord.html

Allen has a lot of great information on his site and it is well worth your time to explore it.

Numbers that aren’t numbers

If a "number" you plan to store can not be used in a mathematical expression, you may choose to store it as text, especially if it has symbols. For example, you would not do math with a telephone number, so you may wish to store it as text so that you can store the symbols and read the value easier (123) 555-1212, especially when you export your data. It is more efficient, however, to store numbers as numbers than text because the number of bytes of storage is less. An advantage to storing numbers as text is that you can store placeholder symbols such as parentheses and dashes.

In the case of a phone number, you could not even store it as a long integer because it has too many digits. You, therefore, would need to store it as a double-precision number and, because double-precision numbers are stored in floating point format, they are not accurate for exact comparisons. On the other hand, a ranking, mass, or level, etc. can be used in math equations (even if you don't plan to use it that way!), so store that kind of data with a numeric data type.

Input Mask

When you use the InputMask property on a text field, some like to store symbols in the field so when you do an export, they will be there. For instance, without storing symbols, a phone number would be 1234567890 when exported as opposed to 123-456-7890 or (123) 456-7890. This can, however, make searching for phone numbers more difficult. If you have a combobox for searching and are storing mask symbols, uou can use a function to translate the phone number to just digits without mask symbols and make that column very thin so it is greater than 0 but doesn't really show.

Allow Zero Length

You can either allow ZLS (zero length string) or not. Some developers do not ever allow empty strings to be stored.

Problem Properties, by Allen Browne http://allenbrowne.com/bug-09.html

If you are importing data and are not sure if it contains ZLS data, then it is a good idea to allow it. You need to be careful here as a ZLS (zero length string), or even a field with a space or two, looks like a Null value even though it is not.

You need to decide for yourself what you will choose to do.

Descriptions

Fill out your field Descriptions! This is what the status bar text will be set to when you slide that field onto a form. The StatusBarText shows up in the lower left corner of the screen on the Status Bar when you are in that control.

Order of Fields

The Order you list fields in the table design does not matter. Often, key fields are at the top of the structure and tracking fields, like DateAdded and DateEdited below are at the bottom.

Tracking date record was created and modified

It is a good idea to add these two tracking fields to all your tables (except tables that are used to provide values for lists and won't be altered, such as a table of states in the United States). Let them be the last 2 fields.

DateAdd, date/time, DefaultValue = Now(), Description → date/time record was added

DateEdit, date/time, DefaultValue = Now(), Description → date/time record last edited

change this on the form BeforeUpdate event
Me.DateEdit = Now()

More information on Normalization

While you are designing your database, the more you read on this subject, the better ideas you will get for your own needs. Here are links you may wish to study:

Access Junkie's (Jeff Conrad) Resource List, Database Design 101
http://www.accessmvp.com/JConrad/accessjunkie/resources.html#DatabaseDesign101

A Little Treatise on Normalization, by Jerry Dennison
http://www.utteraccess.com/forum/t197282.html

Data Modeling, by George Hepworth (GroverParkGeorge)
http://www.gpcdata.com/downloads/normalizationgpgonaccess.pdf

Glossary of Terms and Definitions, by Jerry Dennison
http://www.utteraccess.com/forum/t213518.html

General Primary Keys - Natural vs Surrogate, by Glenn Lloyd (argeedblu)
http://www.utteraccess.com/forum/t637301.html

Microsoft: Object Hungarian Notation Naming Conventions for VB
http://support.microsoft.com/kb/q173738

Microsoft: Description of the database normalization basics
http://support.microsoft.com/kb/283878

Wikipedia: First Normal Form
http://en.wikipedia.org/wiki/First_normal_form

Wikipedia: Second Normal Form
http://en.wikipedia.org/wiki/Second_normal_form

Wikipedia: Third Normal Form
http://en.wikipedia.org/wiki/Third_normal_form

Wikipedia: Fourth Normal Form
http://en.wikipedia.org/wiki/Fourth_normal_form

Wikipedia: Fifth Normal Form
http://en.wikipedia.org/wiki/Fifth_normal_form

There are a lot of things to read! And it may be hard to justify the time on design when your boss wants a database he can use … but as the familiar saying goes: "Measure twice cut once". With Access, it is more like "Measure 50 times, cut once". The time you spend up-front creating a solid structure with logic will save you endless headaches down the road.

Edit Discussion
Custom Search
Thank you for your support!
This page has been accessed 37,313 times.  This page was last modified 04:46, 4 February 2012 by Jack Leach. Contributions by JeffBoyceIF, Glenn Lloyd, TheDBguy and BananaRepublic and others  Disclaimers