UtterAccess HomeUtterAccess Wiki

Welcome Guest ( Log In | Register )

Custom Search
Edit Discussion
> Natural Vs Surrogate Keys    

The phrase "Natural Key vs Surrogate Key" describes a false dilemma in the design and use of keys but it’s a topic that occasionally gets revived in forums like this one so it’s worth a little examination.

A natural key is a key of a database table that identifies some entity in the business domain. Some examples of a natural key might be an invoice number, a vehicle registration number or a user login name. Because they identify things outside the database natural keys are also more appropriately referred to as business keys or domain keys. The remainder of this article mostly uses the term *domain* key which means exactly the same as natural key.

A surrogate key is a key that doesn’t have meaning outside the database. It isn’t used to identify things in the business domain but is only used for technical reasons inside the database or inside some applications that access the database. A surrogate key usually isn’t exposed to (most) users.

Surrogates are used for pragmatic and technical reasons. Sometimes domain keys are expected to change and it can be extremely useful within the database and in application code to work with a more stable alternative key. Surrogate keys can simplify some aspects of refactoring database designs and in some situations using surrogate keys will permit the use of more efficient indexing and storage options within the database.

On the other hand surrogate keys also add complexity. A surrogate key usually means an extra key (because the domain key is still required by business users) so it usually requires additional data access code and other structures to support it. If they are used as the target of foreign key references then surrogate keys often increase the number and complexity of joins needed to retrieve data. Using a surrogate can also increase the size and overhead of indexing and storage structures (because the domain keys usually will be indexed as well). The most prudent way to decide when to use a surrogate is to evaluate all the pros and cons on a case by case basis.

The reason for not exposing a surrogate to business users is related to the reason for using it at all. If such a key is exposed to users then it may acquire some significance in the business domain. Users might start using it as a domain key and the surrogate key values might even get persisted in other systems, in written notes, in emails or elsewhere. If that happens then the original intention of a surrogate being an unseen key independent of any business need is undermined (see the discussion of “Stability”). That a surrogate key is a key never exposed to users was part of E.F.Codd’s original definition of what a surrogate key is.

It’s quite wrong to think that there is an either/or choice to be made between natural and surrogate keys. A database table can and should have as many keys as it needs. Natural keys are normally important in most tables to support data integrity (making sure duplicate data doesn’t get into the database) and to allow the users to identify business information accurately. Natural keys are what support and implement business functional requirements and are virtually essential for effective and correct use of data in most situations. Surrogates by contrast are usually optional from a business requirements perspective and are added for purely technical, non-functional reasons. Tables with surrogate keys usually will and usually should require a domain key (natural key) as well.

STABILITY

Surrogate keys can be useful as a relatively stable alternative key but they do not totally eliminate the requirement to support changing key values. The domain key(s) of a table may need to be changed whether or not a surrogate key is also present. For example if a user’s login name is used as a domain key then the presence of a surrogate user identifier won’t eliminate the need to change that domain key if the user wants to change their name. Since the domain keys are the principal, most important and most used method of identifying and updating information in the database, the issues of how to update them and the impact of those changes are potentially very significant - and that is the case whether or not a surrogate key is also present in the same table. What surrogates can do is help minimise some of the technical impact of such changes by providing a potentially more stable alternative key, especially for use by referencing foreign keys for example.

Stability is always a *relative* measure, not an absolute one. Sometimes the surrogate key values themselves need to be changed. Surrogate key values may be changed if the table design needs to be altered (composition or decomposition into a new set of tables); or if distinct data sets are merged; or during release and configuration management when data is promoted from one environment to another. Surrogate key values may be changed to support various kinds of routine data maintenance operations, though typically not by the direct control of business users.

"Immutability" (absolute unchangeability rather than relative stability) is sometimes mentioned in the context of surrogate keys but is in practice usually unachievable and/or unverifiable. It’s not always possible or even desirable to prevent people changing key values during the lifetime of a database and it may be impossible even to know if it has happened in the past.

KEY GENERATORS

Many DBMSs provide "key generator" features such as auto-numbered columns (“identity” columns), auto-incrementing functions, sequences and globally unique identifiers (guids). These features are often used to generate surrogate key values - so much so that they are sometimes quite wrongly assumed to be virtually synonymous with surrogate keys.

In practice key generator functions are sometimes used to generate domain key values for business use as well (e.g. order numbers). The details of these technical features and their advantages and disadvantages are beyond the scope of this article. Specifically in relation to auto-numbered columns however, it’s worth noting that for technical reasons these usually make a poor choice of domain key – their limitations makes them suitable only “not for human consumption” and they are therefore generally best confined to use as surrogate keys.

Edit Discussion
Custom Search


Thank you for your support!
This page has been accessed 3,785 times.  This page was last modified 09:37, 12 October 2013 by ButtonMoon. Contributions by JeffBoyceIF  Disclaimers