The PRIMARY KEY constraint uniquely identifies each record in
a database table.
Every table should have a primary key. If the primary key is
the same data used by humans to identify the item in reality, then it’s a natural key, e.g., ssn, vehicle vin,
aircraft tail number, part serial number. A natural key is helps to bridge
between the real-world and the database.
The alternative to the
natural key is the surrogate key , surrogate
meaning artificial or a stand-in replacement.For databases, a surrogate key
means an artificial, computer-generated value is used to uniquely identify the
row. SQL Server supports identity columns and globally unique identifiers
(GUIDs) as surrogate keys.
When a PRIMARY KEY constraint is added to an existing column
or columns in the table, the Database Engine examines the existing column data
and metadata to make sure that the following rules for primary keys:
The columns
cannot allow for null values:Columns of PRIMARY KEY constraints that are specified when a
table is created are implicitly converted to NOT NULL. A sparse column cannot
be used as a part of a primary key because sparse columns must allow null
values.
create table id (id int NULL)
alter table id add constraint pk_id primary key(id)
Msg 8111,
Level 16, State 1, Line 2
Cannot define PRIMARY KEY constraint on
nullable column in
table 'id'.
Msg 1750,
Level 16, State 0, Line 2
Could not
create constraint. See previous errors.
If you attempt to add a
primary key constraint on a column that allows NULL values, the statement will
immediately fail, even if the column contains no NULL values.
There can be no duplicate
values:If
a PRIMARY KEY constraint is added to a column that has duplicate values or
allows for null values, the Database Engine returns an error and does not add
the constraint.
You cannot add a PRIMARY
KEY constraint that violates above rules.
The Database Engine automatically creates a unique index to enforce the uniqueness requirement of the PRIMARY KEY constraint.
The Database Engine automatically creates a unique index to enforce the uniqueness requirement of the PRIMARY KEY constraint.
If a clustered index does not already exist on the
table or a nonclustered index is not explicitly specified, a unique, clustered
index is created to enforce the PRIMARY KEY constraint.
If clustered index is there on the table then unique,
non clustered index is created to enforce the PRIMARY KEY constraint.
If a PRIMARY KEY constraint is defined on more than one
column, values may be duplicated within one column, but each combination of
values from all the columns in the PRIMARY KEY constraint definition must be
unique.
If a PRIMARY KEY
constraint already exists, you can modify or delete it. To modify a PRIMARY KEY
constraint, you must first delete the existing PRIMARY KEY constraint and then
re-create it with the new definition.
For example, you may want the PRIMARY
KEY constraint of the table to reference other columns, or you may want to
change the column order, index name, clustered option, or fill factor of the
PRIMARY KEY constraint. However, you cannot change the length of a column
defined with a PRIMARY KEY constraint.
A PRIMARY KEY constraint cannot
be deleted if the following exist:
If it is referenced by a FOREIGN KEY constraint in
another table; the FOREIGN KEY constraint must be deleted first.
Limitations and
Restrictions:
A table can contain only one PRIMARY KEY constraint.
All columns defined within a PRIMARY KEY constraint
must be defined as NOT NULL. If nullability is not specified, all columns
participating in a PRIMARY KEY constraint have their nullability set to NOT
NULL.
For both unique key and primary key constraints, the
columns chosen for the key must be of a datatype that is allowed for an index.
This precludes columns that are can’t be compared, such as ntext, text, image,
varchar(max), nvarchar(max), varbinary(max), XML, geography, geometry, and CLR
user-defined data types that do not support binary ordering.
Permissions:
Creating a new table with a primary key requires CREATE
TABLE permission in the database and ALTER permission on the schema in which
the table is being created.
Creating a primary key in an existing table requires
ALTER permission on the table.
DISABLE PRIMARY KEY
You can disable a primary key using the ALTER TABLE statement
in SQL Server (Transact-SQL).
The syntax to disable a primary key using the ALTER INDEX
statement in SQL Server (Transact-SQL) is:
ALTER INDEX constraint_name ON table_name DISABLE;
ENABLE
PRIMARY KEY
You can enable a primary key using the ALTER INDEX
statement in SQL Server (Transact-SQL).
Syntax
ALTER INDEX constraint_name ON
table_name REBUILD;
Columns level Constraint
creation:
CREATE TABLE Persons(
P_Id int NOT NULL PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)
CREATE TABLE Persons
(
P_Id int
NOT NULL PRIMARY KEY NONCLUSTERED,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)
Table level Constraint
creation:
CREATE TABLE Persons
(
P_Id int
NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName)
)
CREATE TABLE Persons
(
P_Id int
NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
CONSTRAINT pk_PersonID PRIMARY KEY NONCLUSTERED (P_Id,LastName)
)
Constraint creation on existing table:
ALTER TABLE Persons ADD CONSTRAINT
pk_PersonID PRIMARY KEY
(P_Id,LastName)
It is important to note, that by default a primary key is
clustered. This may or may not be the preferred method of creation. Either way,
you can specify the clustered / non clustered option when creating a primary
key.
To create a clustered
primary key on an existing table:
ALTER TABLE dbo.Person ADD CONSTRAINT PK_Person
PRIMARY KEY CLUSTERED
(PersonID);
To create a non clustered
primary key on an existing table:
ALTER TABLE dbo.Person ADD CONSTRAINT PK_Person
PRIMARY KEY NONCLUSTERED
(PersonID);
Drop an Constraint:
ALTER TABLE Persons
DROP CONSTRAINT pk_PersonID
Recommendation for the
Clustering Key:
·
Unique
·
Narrow
·
Static
Why Unique?
A clustering key should be unique because a clustering key (when one exists) is used as the lookup key from all non-clustered indexes.
A clustering key should be unique because a clustering key (when one exists) is used as the lookup key from all non-clustered indexes.
Take
for example an index in the back of a book – if you need to find the data that
an index entry points to – that entry (the index entry) must be unique
otherwise, which index entry would be the one you're looking for? So, when you
create the clustered index – it must be unique. But, SQL Server doesn't require
that your clustering key is created on a unique column. You can create it on
any column(s) you'd like. Internally, if the clustering key is not unique then
SQL Server will “uniquify” it by adding
a 4-byte integer to the data. So if the clustered index is created on
something which is not unique then not only is there additional overhead at index creation, there's wasted disk space,
additional costs on INSERTs and UPDATEs, and in SQL Server 2000, there's
an added cost on a clustereD index rebuild (which because of the poor choice
for the clustering key is now more likely).
Although
SQL Server assigns a row identifier (RID) to each record in a file, users and
user programs can't use an RID as a uniqueidentifier, because you can't
guarantee that an RID will remain constant over time. Therefore, an RID doesn't
meet the criterion that a primary key's value must never change. An RID is
composed of a file number, a page number, and a page's row number. As the
position of each record shifts in the file, the record's associated RID
changes. The primary key you choose must have a constant and unchanging set of
values. When you assign a primary key value to a record, the value must not
change for the life of that record and you can't reuse the value—even after the
record is deleted from the table. In addition to these logical reasons for not
using an RID as a primary key, you can't access it through any supported
programming interface.
Why Narrow?
A clustering key should be narrow for some of the same reasons it should be unique. If the clustering key is used as the lookup key from all non-clustered indexes, then the clustering key is duplicated in all non-clustered indexes. If the clustering key is really wide, then all of the non-clustered indexes will be [unnecessarily] wide. This will waste disk space, create additional costs on INSERTs and UPDATEs, and require more time (because of size) when rebuilding these index structures. So, what does narrow mean – as few bytes as possible to help uniquely define your rows. A narrow numeric when possible.
A clustering key should be narrow for some of the same reasons it should be unique. If the clustering key is used as the lookup key from all non-clustered indexes, then the clustering key is duplicated in all non-clustered indexes. If the clustering key is really wide, then all of the non-clustered indexes will be [unnecessarily] wide. This will waste disk space, create additional costs on INSERTs and UPDATEs, and require more time (because of size) when rebuilding these index structures. So, what does narrow mean – as few bytes as possible to help uniquely define your rows. A narrow numeric when possible.
Why Static?
A clustering key should be static for some of the same reasons it should be unique and narrow. If the clustering key is used as the lookup key from all non-clustered indexes, then the clustering key is duplicated in all non-clustered indexes. In fact, for a given table the clustering key will be the most duplicated data. If this data changes then they'll need to update the value in the base table as well as in EVERY non-clustered index. And, if the key changes, it will cause the record to move. When a record moves, it creates fragmentation. This will waste disk space, create additional costs on INSERTs and UPDATEs, and require more time (because of record relocation and [the likely] subsequent splits) and require more maintenance.
A clustering key should be static for some of the same reasons it should be unique and narrow. If the clustering key is used as the lookup key from all non-clustered indexes, then the clustering key is duplicated in all non-clustered indexes. In fact, for a given table the clustering key will be the most duplicated data. If this data changes then they'll need to update the value in the base table as well as in EVERY non-clustered index. And, if the key changes, it will cause the record to move. When a record moves, it creates fragmentation. This will waste disk space, create additional costs on INSERTs and UPDATEs, and require more time (because of record relocation and [the likely] subsequent splits) and require more maintenance.
OK, so it sounds like I want a narrow, unique
and static value… What about a guid?
Typically,i recommend a numeric IDENTITY column as the clustering key but I always get this question. In fact, I often wait to see how long it's going to take before I get this question ;). Anyway, a guid does meet the criteria fairly well – it's certainly unique, it's usually static and it's relatively narrow. So, what's wrong with it? In SQL Server 2000, the guid function (newid()) is built using a value that does not create an ever increasing pattern (an IDENTITY column would). But wait, I didn't say that you needed to have an ever-increasing pattern…..
Typically,i recommend a numeric IDENTITY column as the clustering key but I always get this question. In fact, I often wait to see how long it's going to take before I get this question ;). Anyway, a guid does meet the criteria fairly well – it's certainly unique, it's usually static and it's relatively narrow. So, what's wrong with it? In SQL Server 2000, the guid function (newid()) is built using a value that does not create an ever increasing pattern (an IDENTITY column would). But wait, I didn't say that you needed to have an ever-increasing pattern…..
OK, so the final criteria I look for in a
clustering key is: an ever-increasing pattern!
If the clustering key is ever-increasing then new rows have a specific location where they can be placed. If that location is at the end of the table then the new row needs space allocated to it but it doesn't have to make space in the middle of the table. If a row is inserted to a location that doesn't have any room then room needs to be made (e.g. you insert based on last name then as rows come in space will need to be made where that name should be placed). If room needs to be made, it's made by SQL Server doing something called a split. Splits in SQL Server are 50/50 splits – simply put – 50% of the data stays and 50% of the data is moved. This keeps the index logically intact (the lowest level of an index – called the leaf level – is a douly-linked list) but not physically intact. When an index has a lot of splits then the index is said to be fragmented. Good examples of an index that is ever-increasing are IDENTITY columns (and they're also naturally unique, natural static and naturally narrow) or something that follows as many of these things as possible – like a datetime column (or since that's NOT very likely to be unique by itself datetime, identity). But wait, what about that a guid.
If the clustering key is ever-increasing then new rows have a specific location where they can be placed. If that location is at the end of the table then the new row needs space allocated to it but it doesn't have to make space in the middle of the table. If a row is inserted to a location that doesn't have any room then room needs to be made (e.g. you insert based on last name then as rows come in space will need to be made where that name should be placed). If room needs to be made, it's made by SQL Server doing something called a split. Splits in SQL Server are 50/50 splits – simply put – 50% of the data stays and 50% of the data is moved. This keeps the index logically intact (the lowest level of an index – called the leaf level – is a douly-linked list) but not physically intact. When an index has a lot of splits then the index is said to be fragmented. Good examples of an index that is ever-increasing are IDENTITY columns (and they're also naturally unique, natural static and naturally narrow) or something that follows as many of these things as possible – like a datetime column (or since that's NOT very likely to be unique by itself datetime, identity). But wait, what about that a guid.
Well, in SQL Server 2000 the only SQL Server
function for guids is newid – that does not create an ever increasing pattern.
In SQL Server 2005, you can use a new guid
function called newsequentialid() to populate your uniqueidentifier column.
Here's an example of how you can use it:
CREATE TABLE Test
(
TestID uniqueidentifier CONSTRAINT Test_TestID_Default DEFAULT newsequentialid(),
Inserted datetime CONSTRAINT Test_Inserted_Default DEFAULT getdate()
)
go
(
TestID uniqueidentifier CONSTRAINT Test_TestID_Default DEFAULT newsequentialid(),
Inserted datetime CONSTRAINT Test_Inserted_Default DEFAULT getdate()
)
go
INSERT Test DEFAULT VALUES
go
go
SELECT * FROM Test
go
go
Need
Further Study:
- Is there a way to create a sequential guid in SQL Server
2000
- Should I use a natural or surrogate key?
There are two strategies for assigning keys to
tables:
Natural
keys: A natural key is one or
more existing data attributes that are unique to the business concept. For the Customer table there was two candidate keys, in this case
CustomerNumber and SocialSecurityNumber or Invoice-Numbers, Tax-Ids, SSN etc..
Surrogate
key: Surrogate keys are keys
that have no “business” meaning and are solely used to identify a record in the
table. Such keys are either database generated (example: Identity in SQL
Server, Sequence in Oracle, Sequence/Identity in DB2 UDB etc.) or system generated
values (like generated via a table in the schema).
Surrogate
Key
I prefer surrogate keys to be DB controlled rather
than being controlled via a next-up table in the schema since that is a more
scalable approach.
Pros:
Business
Logic is not in the keys.
Small
4-byte key (the surrogate key will most likely be an integer and SQL Server for
example requires only 4 bytes to store it, if a bigint, then 8 bytes).
Joins
are very fast.
No
locking contentions because of unique constraint (this refers to the waits that
get developed when two sessions are trying to insert the same unique business
key) as the surrogates get generated by the DB and are cached – very scalable.
Cons:
An
additional index is needed. In SQL
Server, the PK constraint will always creates a unique index, in Oracle, if an
index already exists, PK creation will use that index for uniqueness
enforcement (not a con in Oracle).
Cannot
be used as a search key.
If it
is database controlled, for products that support multiple databases, different
implementations are needed, example: identity in SS2k, before triggers and
sequences in Oracle, identity/sequence in DB2 UDB.
Always
requires a join when browsing the child table(s).
Natural
Key
Pros:
No
additional Index.
Can be
used as a search key.
Cons:
If not
chosen wisely (business meaning in the key(s)), then over a period of time
additions may be required to the PK and modifications to the PK can occur.
If
using strings, joins are a bit slower as compared to the int data-type joins,
storage is more as well. Since storage
is more, less data-values get stored per index page. Also, reading strings is a two step process
in some RDBMS: one to get the actual length of the string and second to
actually perform the read operation to get the value.
Locking
contentions can arise if using application driven generation mechanism for the
key.
Can’t
enter a record until value is known since the value has some meaning.
Choosing
Surrogate vs. Natural Keys:
There is no rule of thumb in this case. It has to
be evaluated table by table:
If we can identify an appropriate natural key that
meets the three criteria for it to be a PK column, we should use it. Look-up tables and configuration tables are
typically ok.
Data-Type for the PK: the smaller the better,
choose an integer or a short-character data type. It also ensures that the joins will be
faster. This becomes even more important
if you are going to make the PK as a clustered index since non-clustered indexes
are built off the clustered index. RDBMS
processes integer data values faster than the character data values because it
converts characters to ASCII equivalent values before processing, which is an
extra step.
No comments:
Post a Comment