Normalization Terms and Concepts
|
|  |
This page requires general cleanup in formatting or text to better fit the UA Wiki Guidelines |
Trying to find all the information you needed to understand normalization and design concepts? It seems that although there is a wealth of information here and around, it's hard to put it into perspective. If you wanted to study the subject thoroughly, it required going to a half dozen different places.
This article’s subject matter has been compiled from unrelated sources, sliced them and diced them , and reassembled into a condensed composite. There are less than 1 dozen words that are my own contained within, and the credit goes to the various contributors.
An index of the complete sources of information can be found
Suggested Readings & Things in FAQA
This mix and match does not represent a verbatim representation of the context from which they were copied.
FirstNormalForm
A Table is in first normal form (1NF) if there are no repeating groups
Eliminate Repeating Groups
Make a separate table for each set of related attributes, and give each table a primary key.
All Attributes must be atomic.
This means that there should be no repeating groups within a field (i.e. multiple values within a field separated by a comma or other delimiter).
Tables Require a PrimaryKey
- Guidelines for Primary Keys
- Value must be unique
- It can never be null
- Cannot be a multi-part field
- Comprised of the minimum number of fields to ensure uniqueness
- Not optional in whole or in part
- Directly identifies each value of the remaining fields in a given record of a table
In order for a field to become a Primary Key for its parent table, it must pass each of these guidelines.
Failure to pass even one will disqualify it as a possible Primary Key.
If you make certain you have established a valid Primary Key, you will greatly reduce the possibility of encountering problems later when you begin to work with the table relationships.
A Glossary of Terms & Definitions , Courtesy of Jerry Dennison
ATTRIBUTE: n. A quality or characteristic inherent in or ascribed to someone or something.
Another common name for an attribute is DESCRIPTOR. An attribute describes some aspect of an entity. Usually, it is one or more attributes that fully define and characterise the uniqueness of an entity. When this happens, this attribute (or combination of attributes) is said to be a "natural key"
CANDIDATE KEY: n. One of several possible attributes or combinations of attributes which can be used to uniquely identify a body of information (entity). Generally speaking, candidate keys should rarely be used as the PRIMARY KEY.
ENTITY: n. A physical or virtual object that exists as a unique and discreet unit.
It cannot be broken down or have anything removed from it without losing the integrity of the entity. (for example, an automobile that comes off the assembly line is an entity (it can be uniquely identified and is not whole if any part of it is removed), it is itself made up of other entities (all of it's myriad parts that are sometimes made up of other parts) each of which is an entity unto itself (it would lose integrity of what defines it as a part if anything were removed).
A table should represent a single entity, so a table is in fact the entity.
FOREIGN KEY: n. Attributes contained within a database record (or other collection of related data items) which uniquely identify another record, but not the one within which they are contained. A foreign key is a reference from one record to another. In Access you create a foreign key by defining a field in the "child" table that will hold the value of the primary key of another table. This is how a related record in one table "knows" which parent record it is related to. All foreign keys must be the same datatype as the parent primary key.
KEY: n. The key represents the entity. In fact, it can be said that the key is the entity (or at least the database's representation of the entity)
PRIMARY KEY: n The CANDIDATE KEY selected that uniquely identifies a body of information (entity).
In Access (or any other RDBMS) for that matter it is usually desirable to use a single field as the primary key. This makes it much easier to link related tables to the entity. It makes it much easier to ensure uniqueness and (with Access) facilitates the updating and deleting of records in related tables.
RELATIONSHIP: n. The condition or fact of being related; connection or association. In database terms this is usually relegated to dependency. Access only directly supports one-to-one and one-to-many relationships. But usually it can always be said that one table (entity) is the "parent" of another table (entity).
Second Normal Form
Must be in First Normal Form <---- the forms are heiarchical, each is dependent on the one before it
A RELATION is in second normal form if each ATTRIBUTE is fully functionally dependent on the ENTIRE primary key. This means that no subset of the key can determine an attribute's value.
Eliminate Redundant Data- If an attribute depends on only part of a multi-valued key, remove it to a separate table.
Eliminate Columns not dependent on the Key- If attributes do not contribute to a description of the key, remove them to a separate table.
Look for values that occur multiple times in a non-key field. This tells you that you have too many fields in a single table.
Functional Dependency on others
Functional dependency (a tough one, the formal definition of Functional Dependence is:
For any relation R, attribute A is fully functionally dependent on attribute B if, for every valid instance, the value of B determines the value of A.), then of course there's multivalued dependency, trivial dependency, and last but not least candidate key.
Third Normal Form
Must be in Second Normal Form
A RELATION is in third normal form when no non-key attribute is dependent on any other non-key attribute.
This and 2NF are the primary Forms that prohibits the storage of calculated values or transitive dependencies.
Transitive dependency is a type of functional dependency in which the value of a non-key field is determined by the value in another non-key field and that field is not a candidate key.
Now, to boil all of this down to something usable. You must ask yourself the following:
Am I repeating groups?
These can be multiple values in a single field, repeated TYPES of fields that share the same datatype and a common root name or root structure (these are the multiple date fields in Drew's db), or repeating tables of the same entity type (these can be noticed because they generally have the same fields but in different tables).
Am I trying to store derived or calculated values? (we should all know by now not to store calculated values)
Do I have multiple tables with more than one index based on multiple fields? (this usually indicates combining of entities, remember: an table is an entity and an entity a table)
Do I have a large number of values being repeated in a column (field)? This does not include FK's linked to another table's PK.
It does include just about anything else. If you find that you're repeating a lot of values then you MAY need to move this to it's own table. This particular question is lowest in priority.
BCNF
Must be in Third Normal Form and all determinates are candidate keys
All candidate keys must satisfy the test for third normal form. A candidate key is of itself a potential unique identifier of the entity--generally speaking candidate keys are mutli-field constructs. This does not mean you should use a candidate key as the PK, it means that it could satisfy the requirements of uniqueness.
For most entities, there are many candidate keys.
Boyce-Codd normal form(BCNF) can be thought of as a "new" third normal form. It was introduced to cover situations that the "old" third normal form did not address .
Keep in mind the mean of a determinant(determines the value of another field)and candidate keys(qualify for designation as Primary Key).
This normal form applies to situations where you have overlapping candidate keys. If a table has no key fields, it is automatically BCNF.
Fourth Normal Form
Must be in 3NF/BCNF
There can be no nontrivial multivalued dependencies in a relation <---- This is a fairly common reduction that most people achieve without even knowing it.
3NF prohibits independent multivalued components of the key. For example, if an employee can have many skills and many dependents you would move the skills and dependents to separate tables as they are not related in any way.
Fully Qualified Naming : Building Attributes, By Jerry Dennison
Attribute = a feature or description that is specific to an entity.
Let's examine a relatively easy way to "build" valid attributes.
Fully Qualified Naming. This is the easiest and best method for helping you define your entities and their relevant attributes. A fully qualified name for an attribute includes everything the attribute is describing.
An example would be a Work Order. The attributes of the enity Work Order might be:
- WorkOrder ID
- WorkOrder Number
- WorkOrder Date Entered
- WorkOrder Date Completed
- WorkOrder LineItem Item ID
- WorkOrder LineItem Item Quantity
- WorkOrder LineItem Item Price
- WorkOrder LineItem Item Discount
- WorkOrder LineItem Item Discription
These two entities are separated into their "natural" groups by name. You could call the 2nd group table LineItem OrderDetail.
You'll notice that a third group appears, called Item. This is in fact another entity that has emerged and should be in it's own table and the OrderDetails (LineItems) is really an intermediate table that helps define a many-to-many relationship between WorkOrder and Items.
The final structure would then look something like this:
- WorkOrder Table
- Primary Key
- Number
- Date Entered
- Date Completed
- Item Table
- Primary Key
- Number
- Description
- Price
etc.......
You might be tempted to store duplicate price values as prices are subject to change and should be specific to the order detail—
however if you need to be able to pull data from history, it is might be better practice to have a Junction Table that joins and relates ItemID, Price and PriceDate.
- ItemPriceTable
- ItemPriceID
- ItemID
- Price
- PriceDate
- WorkOrderDetail Table
- WorkOrderDetail Table Primary Key
- WorkOrderID (foreign key link to WorkOrder)
- ItemID (foreign key link to Item)
- Quantity
- Discount
(note: do not include item description or other attributes from other tables that are not specific to the work order detail entity—any fields in related tables can be accessed using queries)
This page was originally ported from the UtterAccess Forums. It is based heavily or in part on a post by tbowconn.
Normalization: Requoted Quotes