UtterAccess HomeUtterAccess Wiki

Welcome Guest ( Log In | Register )

Custom Search
Edit Discussion
> Normalization Terms and Concepts    
Normalization Terms and Concepts

Image:NotifCleanup.gif This page requires general cleanup in formatting or text to better fit the UA Wiki Guidelines
This page is under consideration for merging with: Normalization, Normalizing A Database

Trying to find all the information you needed to understand normalization and design concepts? It seems that although there is a wealth of information here and around, it's hard to put it into perspective. If you wanted to study the subject thoroughly, it required going to a half dozen different places.

This article’s subject matter has been compiled from unrelated sources, sliced them and diced them , and reassembled into a condensed composite. There are less than 1 dozen words that are my own contained within, and the credit goes to the various contributors.

An index of the complete sources of information can be found Suggested Readings & Things in FAQA

This mix and match does not represent a verbatim representation of the context from which they were copied.


Contents

First Normal Form

A Table is in first normal form (1NF) if there are no repeating groups

Eliminate Repeating Groups
Make a separate table for each set of related attributes, and give each table a primary key. All Attributes must be atomic.
This means that there should be no repeating groups within a field (i.e. multiple values within a field separated by a comma or other delimiter).
Tables Require a PrimaryKey

Guidelines for Primary Keys
  • The value must be unique for each Primary Key
  • They can never be null
  • They cannot be a multi-part field
  • They must be comprised of the minimum number of fields to ensure uniqueness
  • They cannot be optional in whole or in part
  • They directly identify each value of the remaining fields in a given record of a table

In order for a field to become a Primary Key for its parent table, it must pass each of these guidelines.
Failure to pass even one will disqualify it as a possible Primary Key.
When you make certain you have established a valid Primary Key, you will greatly reduce the possibility of encountering problems later when you begin to work with the table relationships.


A Glossary of Terms & Definitions

ATTRIBUTE: n. A quality or characteristic inherent in or ascribed to someone or something. In a (relational) database an attribute is a named and typed variable within a table. Also known as a COLUMN or FIELD of a table.

ENTITY: n. A physical or virtual object that exists as a unique and discrete unit. A table in a database represents a distinct entity type and the rows (tuples) in that table represent instances of entities of that type. Since entities are unique, an accurate representation of entities in a table requires that tuples must be unique within the table. The sets of attributes that make tuples unique are called KEYS (see the following definition).

KEY: n. A set of attributes which collectively are required to be irreducibly unique in a table and which do not permit nulls. A KEY is also more formally called a CANDIDATE KEY. "Unique" means that at all times the attributes of the key must have a different set of values on every row (tuple) in a table. "Irreducible" means that all the attributes of the key are essential to maintain the uniqueness property - remove any one attribute and the values of the remaining attributes would no longer be guaranteed unique for all valid tuples. Key constraints (AKA uniqueness constraints) are database integrity constraints that guarantee uniqueness by preventing duplicate data being entered into key attributes. Keys are fundamental to effective operation of a database because they serve an important data integrity function and provide users with unique identifiers by which they can identify and update information accurately in a table. Generally speaking, every table should have at least one key and very possibly more than one.

PRIMARY KEY: n A candidate key that is designated as the "preferred" identifier for information in a table or that is singled out from other keys of a table for some specific purpose. By convention every table usually has exactly one key designated as PRIMARY KEY. A primary key isn't required to be fundamentally different from any other key of that table so the choice of a primary key is arguably superfluous and always optional. However, for practical reasons of usability and comprehension of the data it is normally a very good idea to designate a primary key for a table.

RELATIONSHIP: n. The condition or fact of being related; an association among things. In semantic modelling and database design a relationship is usually a constraint or dependency between one, two or more data elements. Examples are an inclusion dependency, functional dependency or join dependency. Such relationships play an important role in Normalization Theory. This concept is distinct from and should never be confused with the Access UI feature that is also called a "relationship". Relationships in the Access sense support several different technical features but are beyond the scope of this article which is specifically concerned with Normalization concepts.

FOREIGN KEY: n. A set of attributes that are subject to an inclusion dependency constraint. Values of foreign key attributes are required to match the values of a candidate key in a table (usually in a different table to the FOREIGN KEY itself, which is what the word "foreign" is supposed to imply). Note that foreign "keys" are usually not keys at all and so the name is rather misleading.

Second Normal Form

Must be in First Normal Form <---- the forms are heiarchical, each is dependent on the one before it A RELATION is in second normal form if each ATTRIBUTE is fully functionally dependent on the ENTIRE primary key. This means that no subset of the key can determine an attribute's value.
Eliminate Redundant Data- If an attribute depends on only part of a multi-valued key, remove it to a separate table.
Eliminate Columns not dependent on the Key- If attributes do not contribute to a description of the key, remove them to a separate table.
Look for values that occur multiple times in a non-key field. This tells you that you have too many fields in a single table.

Functional Dependency on others Functional dependency (a tough one, the formal definition of Functional Dependence is:
For any relation R, attribute A is fully functionally dependent on attribute B if, for every valid instance, the value of B determines the value of A.), then of course there's multivalued dependency, trivial dependency, and last but not least candidate key.


Third Normal Form

Must be in Second Normal Form A RELATION is in third normal form when no non-key attribute is dependent on any other non-key attribute.
This and 2NF are the primary Forms that prohibits the storage of calculated values or transitive dependencies.

Transitive dependency is a type of functional dependency in which the value of a non-key field is determined by the value in another non-key field and that field is not a candidate key.

Now, to boil all of this down to something usable. You must ask yourself the following:

Am I repeating groups?
These can be multiple values in a single field, repeated TYPES of fields that share the same datatype and a common root name or root structure (these are the multiple date fields in Drew's db), or repeating tables of the same entity type (these can be noticed because they generally have the same fields but in different tables).
Am I trying to store derived or calculated values? (we should all know by now not to store calculated values)
Do I have multiple tables with more than one index based on multiple fields? (this usually indicates combining of entities, remember: an table is an entity and an entity a table)
Do I have a large number of values being repeated in a column (field)? This does not include FK's linked to another table's PK.
It does include just about anything else. If you find that you're repeating a lot of values then you MAY need to move this to it's own table. This particular question is lowest in priority.


BCNF

Must be in Third Normal Form and all determinates are candidate keys All candidate keys must satisfy the test for third normal form. A candidate key is of itself a potential unique identifier of the entity--generally speaking candidate keys are mutli-field constructs. This does not mean you should use a candidate key as the PK, it means that it could satisfy the requirements of uniqueness.
For most entities, there are many candidate keys.

Boyce-Codd normal form(BCNF) can be thought of as a "new" third normal form. It was introduced to cover situations that the "old" third normal form did not address .
Keep in mind the mean of a determinant(determines the value of another field)and candidate keys(qualify for designation as Primary Key).
This normal form applies to situations where you have overlapping candidate keys. If a table has no key fields, it is automatically BCNF.


Fourth Normal Form

Must be in 3NF/BCNF There can be no nontrivial multivalued dependencies in a relation <---- This is a fairly common reduction that most people achieve without even knowing it.
3NF prohibits independent multivalued components of the key. For example, if an employee can have many skills and many dependents you would move the skills and dependents to separate tables as they are not related in any way.


Fully Qualified Naming : Building Attributes, By Jerry Dennison

Attribute = a feature or description that is specific to an entity.
Let's examine a relatively easy way to "build" valid attributes.
Fully Qualified Naming. This is the easiest and best method for helping you define your entities and their relevant attributes. A fully qualified name for an attribute includes everything the attribute is describing. An example would be a Work Order. The attributes of the enity Work Order might be:

WorkOrder ID
WorkOrder Number
WorkOrder Date Entered
WorkOrder Date Completed
WorkOrder LineItem Item ID
WorkOrder LineItem Item Quantity
WorkOrder LineItem Item Price
WorkOrder LineItem Item Discount
WorkOrder LineItem Item Description

These two entities are separated into their "natural" groups by name. You could call the 2nd group table LineItem OrderDetail.
You'll notice that a third group appears, called Item. This is in fact another entity that has emerged and should be in it's own table and the OrderDetails (LineItems) is really an intermediate table that helps define a many-to-many relationship between WorkOrder and Items.
The final structure would then look something like this:

WorkOrder Table
Primary Key
Number
Date Entered
Date Completed
Item Table
Primary Key
Number
Description
Price

etc.......

You might be tempted to store duplicate price values as prices are subject to change and should be specific to the order detail—
however if you need to be able to pull data from history, it is might be better practice to have a Junction Table that joins and relates ItemID, Price and PriceDate.

ItemPriceTable
ItemPriceID
ItemID
Price
PriceDate
WorkOrderDetail Table
WorkOrderDetail Table Primary Key
WorkOrderID (foreign key link to WorkOrder)
ItemID (foreign key link to Item)
Quantity
Discount

(note: do not include item description or other attributes from other tables that are not specific to the work order detail entity—any fields in related tables can be accessed using queries)


This page was originally ported from the UtterAccess Forums. It is based heavily or in part on a post by tbowconn.
Normalization: Requoted Quotes

Edit Discussion
Custom Search
Thank you for your support!
This page has been accessed 9,982 times.  This page was last modified 02:05, 19 January 2014 by Ace. Contributions by George Hepworth, ButtonMoon, Jack Leach and Cpetermann  Disclaimers