Normalizing A Database
|
|  |
This page requires general cleanup in formatting or text to better fit the UA Wiki Guidelines |
Some 'givens' to remember with all db's:
- 1) All Primary Key's (PK's) should be Autonumbers. PK's, as the inimitable Jerry Dennison (the true expert around UtterAccess on normalization,) puts it: "PK's are NOT for human consumption." they're meant to uniquely identify the records in a table only. The 'address' of the record in the database, if you'd like. Autonumbers
- 2) Do NOT use spaces or non-alphameric characters (#,/,?, etc.) in field or object names. Many, many, many headaches lie down that path, so steer clear.
- 3) Do NOT use 'Reserved' words as the name of fields or objects. Reserved_Words_In_Access , Reserved_Words
The first thing to always ask yourself when designing the structure is "What end product/s do I need?"
Then step back and ask "What data do I need to get to the end product/s?" Looking at this one another way, you could say, "What fields do I need?"
You then group the related fields together into tables, evaluating each field to see if you are at 3NF. If not, break them down, adding fields/tables until you get there.
Until you get comfortable with the process, do it on paper first, before working in an actual database.
Let’s look at a means to track customer orders as your end product.
Data needed would include Customer info (name, address, phone number), Inventory (Parts) info (Part #, name, price) and Order info (order date, order #, customer, parts ordered, total price) so you can reach the desired end product.
Looking at the Customer table: Naming_Conventions (tbl in front of each table name, frm for form, rpt for report, subfrm for subform, etc.) would indicate we name it tblCustomers. Remembering Given #1 above, we'll start the table with an autonumber field to identify (id) each record, and name it CustomerID and make it the PK.
The next field to be needed is the name of the customer, a text field.
This gives us:
CustomerID PK Autonumber
- CustomerName Text
Four Questions to Ask Yourself
Use this series of questions to ask yourself with each field as you add it, to see if you are normalizing the table as you go:
- 1) Can the field be broken down further? Are there components to the field, or fields that actually make up the field I just added?
- 2) Will any other record ever contain this exact same information?
- 3) Does the field describe ONLY this record?
- 4) Can a entity of this type (in this case, Customer) have more than one of this field?
If you answer NO to questions 1, 2 and 4, and YES to question 3,you can say, “I'm in pretty good shape to continue to the next field, in normalization terms.”
Now we're at the customer address. You figure it to be a text field.
This gives us:
- CustomerID PK Autonumber
- CustomerName Text
- CustomerAddress Text
- CustomerPhoneNumber Text
Asking those 4 questions, you decide that:
- 1) It CAN be broken down further.
Addresses consist of other components: 1 or 2 lines of street address and/or suite # or similar data, city, state and zip code
- 2) You don't believe any other customer would have the exact same street address, but could and probably will have the same city, state and/or zip code.
- 3) Debatable, but for this business purpose, you believe that we won't have more than one customer with the same address.
- 4) Yes, a customer CAN and probably will have more than one address.
A lot of businesses and people have a physical address, a billing address and a mailing address with some or all of these being different.
From these answers you realize that the field needs to be changed. With the fact that question 4 is yes, you can tell fairly easily (in this situation) that you need an additional table, tblAddresses.
So back to the step of deciding what fields are needed. you decide that you need:
- CustomerAddressID PK Autonumber
- CustomerID FK (Foreign Key) to tblCustomers this links tblCustomerAddresses to tblCustomers in a many to one relationship (one customer - many addresses)
- Address1 Text
- Address2 Text
- City Text
- State Text
- Zipcode Text (you won't do calculations on the zip, so it should be text)
- AddressType Text (physical, mailing, etc)
Going through the same 4 question process, you feel comfortable that the first 3 fields are good to go.
AddressID is the Primary Key and it IS an Autonumber. Address1 and Address2 meet all requirements as well.
Now we get to city, state and zip. More than one address can have the same city, state, zip combination. Looks like we can say NO to the #2 question, a good indicator that another additional table is needed, tblCities.
You remove the three bottom fields and add them to tblCities, with an Autonumber PK, CityID:
- CityID PK Autonumber
- City Text
- State Text
- Zipcode Text
Same 4 questions again: you feel good about City and Zipcode, but more than one record could have the same state, indicating a need for yet another table:
- tblStates
- StateID PK Autonumber
- StateAbbr Text (state abbreviation)
- StateName Text
No to all 4 questions means this table should be normalized. Now you have to relate it back to tblCities. You add a Foreign Key (FK) to tblCities linking tblCities to tblStates in a many-to-one relationship (Many cities can have the same state).
This gives you:
- tblCities
- CityID PK Autonumber
- City Text
- StateID FK to tblStates
- Zipcode Text
Now you need to relate tblCities to tblCustomerAddresses. You add a FK to tblCustomerAddresses linking it to tblCities in a many-to-one relationship (Many addresses can have the same city/state/zip).
This gives you:
- tblCustomerAddresses
- AddressID PK Autonumber
- CustomerID FK to tblCustomers
- Address1 Text
- Address2 Text
- CityID FK to tblCities
- AddressType Text
Now to the AddressType field. Again with the 4 questions,you decide that more than one record will have the same address type. This again means an additional table.
You decide on:
- tblAddressTypes
- AddressTypeID PK Autonumber
- AddressType Text
After questioning the fields here, you decide the table is good and go back to tblCustomerAddresses to relate the two tables.
You remove the AddressType text field and add a FK, AddressTypeID::tblCustomerAddresses
- AddressID PK Autonumber
- CustomerID FK to tblCustomers
- Address1 Text
- Address2 Text
- CityID FK to tblCities
- AddressTypeID FK to tblAddressTypes (many to one, many addresses can have the same type)
Looking back to tblCustomers, you no longer need customer address as a field in that table, since the relationship with tblCustomerAddresses takes care of this.
This leaves:
- tblCustomers
- CustomerID
- CustomerName
- CustomerPhoneNumber
The usual 4 questions gives you the answers that you are going to need another table. One customer can have many phone numbers.
You decide on:
- tblPhoneNumbers
- PhoneNumberID PK Autonumber
- CustomerID FK to tblCustomers (one customer, many phone numbers)
- PhoneNumberType Text (mobile, office, home, etc.)
- PhoneNumber Text
After analyzing this table with the 4 questions, you see that, again, you must add another table as many phone numbers can be the same type:
- tblPhoneNumberTypes
- PhoneNumberTypeID PK Autonumber
- PhoneNumberType Text
After adjusting tblPhoneNumbers by removing the PhoneNumberType text field and adding the FK to relate to tblPhoneNumberTypes, you have:
- tblPhoneNumbers
- PhoneNumberID PK Autonumber
- CustomerID FK to tblCustomers (one customer, many phone numbers)
- PhoneNumberTypeID FK to tblPhoneNumberTypes
- PhoneNumber Text
You now have 9 tables versus the original 3, just from analyzing one table!
Tables now are:
- tblCustomers
- tblCustomerAddresses
- tblAddressTypes
- tblCities
- tblStates
- tblParts
- tblOrders
- tblPhoneNumbers
- tblPhoneNumberTypes
Moving on to tblParts, you decide you need these fields:
- PartID PK Autonumber
- PartNumber Text
- PartDescription Text
- PartUnitPrice Currency
Analyzing with the 4 questions, this looks pretty good as is. No additional tables needed, unless later you decide to add a PartCategory, such as motors, valves, electrical, miscellaneous, etc. If you did that, you’d need a tblPartsCategory, with a FK in tblParts to relate the two tables.
Moving on to tblOrders, you decide you need:
- tblOrders
- OrderID PK Autonumber
- CustomerID FK to tblCustomers
- OrderNumber Text
- OrderDate Date/Time
- PartID FK to tblParts
- QtyOrdered Number
- PaymentMethod Text
Analyzing here, you see that you are repeating data (fields) in a lot of records this way, since orders will, a lot of times, have more than one part ordered. This means yet another table. You decide since it line items for an order to name it tblOrderLineItems.
This new table only needs to link to tblOrders, so the customer and order specific data do not go in the new table, only the parts ordered info goes here.
You decide on:
- tblOrderLineItems
- OrderLineID PK Autonumber
- OrderID FK to tblOrders (many to one, many line items to each order)
- PartID FK to tblParts
- QtyOrdered Number
Initial analyzing says you’re ok here. On further thought though, you realize that parts prices can and WILL change over time.
You need to store that amount, to keep an accurate history.
Adding that field gives you:
- tblOrderLineItems
- OrderLineID PK Autonumber
- OrderID FK to tblOrders (many to one, many line items to each order)
- PartID FK to tblParts
- QtyOrdered Number
- PartPrice Currency
This leaves tblOrders to look like:
- tblOrders
- OrderID PK Autonumber
- CustomerID FK to tblCustomers
- OrderNumber Text
- OrderDate Date/Time
- PaymentMethod Text
PaymentMethod winds up in its own table after the 4 question analyzation (more than one order can have the same payment method, you would be duplicating data):
- tblPaymentMethod
- PaymentMethodID PK Autonumber
- PaymentMethod Text
Removing the PaymentMethod text field and adding the FK for the relationship changes tblOrders to:
- tblOrders
- OrderID PK Autonumber
- CustomerID FK to tblCustomers
- OrderNumber Text
- OrderDate Date/Time
- PaymentMethodID FK to tblPaymentMethods (many to one, many orders with the same method)
The tables now are:
- tblCustomers
- CustomerID PK Autonumber
- CustomerName
- tblCustomerAddresses
- AddressID PK Autonumber
- CustomerID FK to tblCustomers
- Address1 Text
- Address2 Text
- CityID FK to tblCities
- AddressTypeID FK to tblAddressTypes
- tblAddressTypes
- AddressTypeID PK Autonumber
- AddressType Text
- tblCities
- CityID PK Autonumber
- City Text
- StateID FK to tblStates
- Zipcode Text
- tblStates
- StateID PK Autonumber
- StateAbbr Text (state abbreviation)
- StateName Text
- tblPhoneNumbers
- PhoneNumberID PK Autonumber
- CustomerID FK to tblCustomers
- PhoneNumberTypeID FK to tblPhoneNumberTypes
- PhoneNumber Text
- tblPhoneNumberTypes
- PhoneNumberTypeID PK Autonumber
- PhoneNumberType Text
- tblParts
- PartID PK Autonumber
- PartNumber Text
- PartDescription Text
- tblOrders
- OrderID PK Autonumber
- CustomerID FK to tblCustomers
- OrderNumber Text
- OrderDate Date/Time
- PaymentMethodID FK to tblPaymentMethods
- tblOrderLineItems
- OrderLineID PK Autonumber
- OrderID FK to tblOrders (many to one, many line items to each order)
- PartID FK to tblParts
- QtyOrdered Number
- PartPrice Currency
- tblPaymentMethods
- PaymentMethodID PK Autonumber
- PaymentMethod Text
A final total of 11 tables from an original 3!! While this may seem, at first, to be over-complicating things, it will make things so much easier as you work into forms and reports that you will end up actually SAVING time in the long run!
At this point, you realize that you can and should keep a history of the price of parts.
You create a table named tblPartsPrice and use PartPriceID as a FK from tblOrderLineItems instead of a currency field to just hold the data.
- tblPartsPrice
- PartPriceID PK Autonumber
- PartID FK to tblParts
- DateofPrice Date/Time
- PartPrice Currency
Finally, after all this work on paper (now 12 tables from the initial 3), you are at Third Normal Form (3NF) and ready to enter the actual tables in a new blank database.
Once that is done and you go to the Relationships window and define your relationships there with the appropriate Cascade Update/Delete options, you’re done --for now, until something new is added/the scope of the project changes--with the table structure and ready to look at the user interface to get the data into the db....
This page was originally ported from the UtterAccess Forums. It is based heavily or in part on a post by NoahP.
Normalizing a Database