A Little Treatise on Normalization
Apr 8 2003, 09:12 AM
Joined: 31-January 00
From: South Carolina, USA
By request I'm copying and pasting a little ditty on the Forms of Data Normalization I wrote in an effort to help a poster better understand how to normalize their data structure. It is not all inclusive, nor does it adequately (at least in my mind) address the Forms but it may offer some small help to those struggling with their table structures.
Strictly speaking, addresses do not describe the entity that is people. And many would espouse that you separate addresses into their own table and relate them back to the people that reside at that address at that time. This is one of those grey areas where we all tend to violate the forms to a degree. This in no way implies that we should violate them with impunity. Now, when it comes to the definition of entity, attribute, and relationships these are actually quite clearly defined (they're just extremely difficult to understand in the raw form). Database objects have absolutely nothing to do with the data model. It is the data model that the definitions detail as far as I'm concerned. I may as well try to spell out my understanding of the Forms in as simple terms as possible.
First Normal Form:
Every table should have a Key <---- this means that each record must be uniquely identified
All ATTRIBUTES must be atomic <----- this means that there should be no repeating groups within a field (i.e. multiple values within a field separated by a comma or other delimiter) Strictly speaking, this phrase has nothing to do with repeating groups of fields or tables, that is actually covered (however esoterically) by other Forms. BUT, I tend to expand the meaning of this Form to include repeating groups wherever they may be found.
Second Normal Form:
Must be in First Normal Form <---- the forms are heiarchical, each is dependent on the one before it
A RELATION is in second normal form if each ATTRIBUTE is fully functionally dependent on the ENTIRE primary key <---- this means that no subset of the key can determine an attribute's value
Third Normal Form:
Must be in Second Normal Form
A RELATION is in third normal form when no non-key attribute is dependent on any other non-key attribute <---- this and 2NF are the primary Forms that prohibits the storage of calculated values or transitive dependencies.
Must be in Third Normal Form
All candidate keys must satisfy the test for third normal form <---- a candidate key is of itself a potential unique identifier of the entity, generally speaking candidate keys are mutli-field constructs. This does not mean you should use a candidate key as the PK, it means that it could satisfy the requirements of uniqueness. For most entities, there are many candidate keys.
Fourth Normal Form
Must be in 3NF/BCNF
There can be no nontrivial multivalued dependencies in a relation <---- This is a fairly common reduction that most people achieve without even knowing it. This form prohibits independent multivalued components of the key. For example, if an employee can have many skills and many dependents you would move the skills and dependents to separate tables as they are not related in any way.
Fifth Normal Form
Must be in Fourth Normal Form
This is Nervana of DB design and is seldom reached. Basically, it advocates that you continue splitting the structure down until either of two states exist: that you've split so far that the resulting tables could not be joined to reconstruct the original, OR further splitting would be trivial.
Natually, this doesn't even come close to really describing what you're trying to accomplish. There are also definitions that need to be understood, specifically around what an entity is, what an attribute is, what a relation is, functional dependency (a tough one, the formal definiton of Functional Dependence is: For any relation R, attribute A is fully functionally dependent on attribute B if, for every valid instance, the value of B determines the value of A.), then of course there's multivalued dependency, trivial dependency, and last but not least candidate key.
Now, to boil all of this down to something usable. You must ask yourself the following:
Am I repeating groups? These can be multiple values in a single field, repeated TYPES of fields that share the same datatype and a common root name or root structure (these are the multiple date fields in Drew's db), or repeating tables of the same entity type (these can be noticed because they generally have the same fields but in different tables).
Am I trying to store derived or calculated values? (we should all know by now not to store calculated values)
Do I have multiple tables with more than one index based on multiple fields? (this usually indicates combining of entities, remember: an table is an entity and an entity a table)
Do I have a large number of values being repeated in a column (field)? This does not include FK's linked to another table's PK. It does include just about anything else. If you find that you're repeating a lot of values then you MAY need to move this to it's own table. This particular question is lowest in priority.
I hope this long winded diatribe helps a little.
|Search Top Lo-Fi||22nd February 2017 - 10:10 AM|