Exact Numerics:
int, bigint, smallint, and
tinyint
Exact number data types that use integer data.
bigint
·
Integer
(whole number) data from -2^63
(-9,223,372,036,854,775,808) through 2^63-1 (9,223,372,036,854,775,807).
Storage size is 8 bytes.
int
·
Integer
(whole number) data from -2^31
(-2,147,483,648) through 2^31 - 1 (2,147,483,647). Storage size is 4 bytes. The
SQL-92 synonym for int is integer.
smallint
·
Integer
data from -2^15 (-32,768) through 2^15 - 1 (32,767). Storage size is 2 bytes.
tinyint
·
Integer
data from 0 through 255. Storage size is 1 byte.
Remarks
·
The
bigint data type is supported where integer values are supported. However,
bigint is intended for special cases where the integer values may exceed the
range supported by the int data type. The int data type remains the primary
integer data type in SQL Server.
·
bigint
fits between smallmoney and int in the
data type precedence chart.
·
Functions
will return bigint only if the parameter expression is a bigint data type. SQL
Server will not automatically promote other integer data types (tinyint,
smallint, and int) to bigint.
create table test(id1 tinyint,id2 smallint,id3 int,id4 bigint)
Description of Table is: sp_help ‘test’
Column_name Type
Length Prec Scale
id1 tinyint 1 3 0
id2 smallint 2 5 0
id3 int 4 10 0
id4 bigint 8
Length: The number of Bytes
allocated for column.
Tinyint – 1
byte
Smallint - 2
byte
Int -
4 byte
bigInt - 8
byte
Precissioc:The number of
integer positions (left to decimal point)
Tinyint – 1
byte 2 power 7 =128
so 3 positions
Smallint - 2
byte 2 power 15 =32768 so 5 positions
Int -
4 byte 2 power 31 =2147483648
so 10 positions
bigInt - 8
byte
Note: For Bigint data type
both precission and scale are showing as empty,because in sp_help stored
procedure logic they didn’t include the bigint data type while caluclating this
precission and scale values.
If we query the syscolumns then for the object then will get perfect
details about all columns.
Select * from syscolumns where id=(select id from sysobjects where
name=’test’)
1. When integers are
implicitly converted to a character data type, if the integer is too large to
fit into the character field, SQL Server enters ASCII character 42, the
asterisk (*).
declare
@var char(2)
set @var=152
select @var as result
result
------
*
(1 row(s) affected)
2. Integer constants
greater than 2,147,483,647 are converted to the decimal data type, not the bigint
data type.The following example shows that when the threshold value is
exceeded, the data type of the result changes from an int to a decimal.
SELECT 2147483647 / 2 AS Result1,
2147483649 / 2 AS Result2 ;
Here is the result set.
Result1 Result2
1073741823 1073741824.500000
The same even
the integer constant is smaller than -2^31 (-2,147,483,648)
SELECT -2147483650 / 2 AS Result2 ;
Result2
---------------------------------------
-1073741825.000000
3. From SQL authority: Pinal,I am migrating my
database from MySQL to SQL Server and I have faced unique situation.
I have been using Unsigned
64-bit integer in MySQL but when I try to migrate that column to SQL Server, I
am facing an issue as there is no datatype which I find appropriate for my
column. It is now too late to change the datatype and I need immediate
solution.
One chain of thought was to
change the data type of the column from Unsigned 64-bit (BIGINT) to VARCHAR(n)
but that will just change the data type for me such that I will face quite a
lot of performance related issues in future.
In SQL Server we also have
the BIGINT data type but that is Signed 64-bit datatype. BIGINT datatype in SQL
Server have range of -2^63 (-9,223,372,036,854,775,808) to 2^63-1
(9,223,372,036,854,775,807). However, my digit is much larger than this number.
Is there anyway, I can store
my big 64-bit Unsigned Integer without loosing much of the performance of by
converting it to VARCHAR.”
Very interesting question,
for the sake of the argument, we can ask user that there should be no need of
such a big number or if you are taking about identity column I really doubt
that if your table will grow beyond this table. Here the real question which I
found interesting was how to store 64-bit unsigned integer value in SQL Server
without converting it to String data type. After thinking a bit, I found a
fairly simple answer.
I can use NUMERIC data type.
I can use NUMERIC(20)
datatype for 64-bit unsigned integer
value, NUMERIC(10) datatype for 32-bit unsigned integer value
and NUMERIC(5) datatype for 16-bit unsigned integer value. Numeric
datatype supports 38 maximum of 38 precision.
Now here is another thing to
keep in mind.
·
Using
NUMERIC datatype will indeed accept the 64-bit unsigned integer but in
future if you try to enter negative value, it will also allow the same. Hence,
you will need to put any additional constraint over column to only accept
positive integer there.
·
Here
is another big concern, SQL Server will store the number as numeric and will
treat that as a positive integer for all the practical purpose. You will have
to write in your application logic to interpret that as a 64-bit Unsigned
Integer. On another side if you are using unsigned integers in your
application, there are good chance that you already have logic taking care of
the same.
Benefits example:
Choosing the sql data type
tinyint instead of int for a "ProductType" column with values ranging
from 1 to 10 will save three bytes per record. With 100,000 records you will
save 300,000 bytes. That's not much in terms of disc space ("storage is
cheap, etc") but you'll probably have indexes containing that column and
if that index takes less memory the database engine will process that index
much more efficient in every "join" and "where" etc.
So, queries will perform
faster,release locks earlier (if any) and use less system resources (memory and
CPU).This will make the whole server perform better as there will be more
resources available for other things.
Once learned the sql data
types available and spending a few extra minutes when designing your schema
will result in faster query execution and an overall better performing
database.
When you use the +, -, *, /, or % arithmetic
operators to perform implicit or explicit conversion of int, smallint, tinyint,
or bigint constant values to the float, real, decimal or numeric data types,
the rules that SQL Server applies when it calculates the data type and
precision of the expression results differ depending on whether the query is
autoparameterized or not.
Therefore, similar
expressions in queries can sometimes produce different results. When a query is
not autoparameterized, the constant value is first converted to numeric, whose
precision is just large enough to hold the value of the constant, before
converting to the specified data type. For example, the constant value 1 is
converted to numeric (1, 0), and the constant value 250 is converted to numeric
(3, 0).
When a query is
autoparameterized, the constant value is always converted to numeric (10, 0)
before converting to the final data type. When the / operator is involved, not
only can the result type's precision differ among similar queries, but the
result value can differ also.
For example, the result value
of an autoparameterized query that includes the expression SELECT CAST (1.0 / 7
AS float) will differ from the result value of the same query that is not
autoparameterized, because the results of the autoparameterized query will
be truncated to fit into the numeric (10, 0) data type.
Decimal and Numeric (Transact-SQL):
Numeric data types that have
fixed precision and scale.
Arguments:
decimal [ (p[ ,s] )] and
numeric[ (p[ ,s] )]
Fixed precision and scale
numbers. When maximum precision is used, valid values are from - 10^38 +1
through 10^38 - 1. The ISO synonyms for decimal are dec and dec(p, s).
numeric is functionally
equivalent to decimal.
p (precision)
The maximum total number of decimal digits that will be stored, both to
the left and to the right of the decimal point. The precision must be a value
from 1 through the maximum precision of 38. The default precision is 18.
s (scale)
The number of decimal digits that will be stored to the right of the
decimal point.
This number is substracted from p to determine the maximum number of
digits to the left of the decimal point.
The maximum number of decimal digits that can be stored to the right of
the decimal point.
Scale must be a value from 0 through p. Scale can be specified only if
precision is specified.
The default scale is 0; therefore, 0 <= s <= p. Maximum storage
sizes vary, based on the precision.
|
Precision
|
Storage
bytes
|
|
1
- 9
|
5
|
|
10-19
|
9
|
|
20-28
|
13
|
|
29-38
|
17
|
Converting decimal and numeric Data
For the decimal and numeric data types, SQL Server considers each specific
combination of precision and scale as a different data type. For example,
decimal(5,5) and decimal(5,0) are considered different data types.
In Transact-SQL statements, a constant with a decimal point is
automatically converted into a numeric data value, using the minimum precision
and scale necessary. For example, the constant 12.345 is converted into a
numeric value with a precision of 5 and a scale of 3.
Converting from decimal or numeric to float or real can cause some loss of
precision.
Converting from int, smallint, tinyint, float, real, money, or smallmoney
to either decimal or numeric can cause overflow.
By default, SQL Server uses rounding when converting a number to a decimal
or numeric value with a lower precision and scale. However, if the SET
ARITHABORT option is ON, SQL Server raises an error when overflow occurs. Loss
of only precision and scale is not sufficient to raise an error.
When converting float or real values to decimal or numeric, the decimal
value will never have more than 17 decimals. Any float value < 5E-18 will
always convert as 0.
bit (Transact-SQL):
An integer data type that can
take a value of 1, 0, or NULL.
Remarks:
The SQL Server Database
Engine optimizes storage of bit columns. If there are 8 or less bit columns in
a table, the columns are stored as 1 byte. If there are from 9 up to 16 bit
columns, the columns are stored as 2 bytes, and so on.
The string values TRUE and
FALSE can be converted to bit values: TRUE is converted to 1 and FALSE is
converted to 0.
Converting to bit promotes
any nonzero value to 1.
money and smallmoney (Transact-SQL):
Data types that represent
monetary or currency values.
|
Data
type
|
Range
|
Storage
|
|
money
|
-922,337,203,685,477.5808
to 922,337,203,685,477.5807
|
8
bytes
|
|
smallmoney
|
-
214,748.3648 to 214,748.3647
|
4
bytes
|
The money and smallmoney data
types are accurate to a ten-thousandth of the monetary units that they
represent.
Use a period to separate
partial monetary units, like cents, from whole monetary units. For example,
2.15 specifies 2 dollars and 15 cents.
Currency or monetary data
does not need to be enclosed in single quotation marks
( ' ). It is important to
remember that while you can specify monetary values preceded by a currency
symbol, SQL Server does not store any currency information associated with the
symbol, it only stores the numeric value.
Approximate Numerics:
float and real (Transact-SQL)
Approximate-number data types
for use with floating point numeric data. Floating point data is approximate;
therefore, not all values in the data type range can be represented exactly.
The ISO synonym for real is float(24)
Syntax:
float [ (n) ]
Where n is the number of bits
that are used to store the mantissa of the float number in scientific notation
and, therefore, dictates the precision and storage size. If n is specified, it
must be a value between 1 and 53. The default value of n is 53.
|
nvalue
|
Precision
|
Storage
size
|
|
1-24
|
7
digits
|
4
bytes
|
|
25-53
|
15
digits
|
8
bytes
|
The SQL Server float[(n)]
data type complies with the ISO standard for all values of n from 1 through 53.
The synonym for double precision is float(53).
Remarks:
|
Data
type
|
Range
|
Storage
|
|
float
|
-
1.79E+308 to -2.23E-308, 0 and 2.23E-308 to 1.79E+308
|
Depends
on the value of n
|
|
real
|
-
3.40E + 38 to -1.18E - 38, 0 and 1.18E - 38 to 3.40E + 38
|
4
Bytes
|
Converting float and real Data
Values of float are truncated
when they are converted to any integer type.
When you want to convert from
float or real to character data, using the STR string function is usually more
useful than CAST( ).
This is because STR enables
more control over formatting. For more information, see STR (Transact-SQL) and
Built-in Functions (Transact-SQL).
Conversion of float values
that use scientific notation to decimal or numeric is restricted to values of
precision 17 digits only. Any value with precision higher than 17 rounds to
zero.
Char and Varchar (Transact-SQL):
Are string data types of
either fixed length or variable length.
Arguments:
char [ ( n ) ]
Fixed-length, non-Unicode
string data. n defines the string length and must be a value from 1 through
8,000. The storage size is n bytes. The ISO synonym for char is character.
varchar [ ( n | max ) ]
Variable-length, non-Unicode
string data. n defines the string length and can be a value from 1 through
8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of
the data entered + 2 bytes. The ISO synonyms for varchar are char varying
or character varying.
CHAR Data Type is a Fixed Length Data Type. For example,
if you declare a variable/column of CHAR (10) data type, then it will always
take 10 bytes irrespective of whether you are storing 1 character or 10
character in this variable or column. And in this example, as we have declared
this variable/column as CHAR(10), so we can store max 10 characters in this column.
On the other hand, VARCHAR is
a variable length Data Type. For example, if you declare a variable/column of
VARCHAR (10) data
type, it will take the number of bytes equal to the number of characters stored
in this column. So, in this variable/column, if you are storing only one
character, then it will take only one byte and if we are storing 10 characters,
then it will take 10 bytes. And in this example, as we have declared this
variable/column as VARCHAR (10), so we can store max 10 characters in
this column.
DECLARE @CharName Char(20) = 'Sql',
@VarCharName VarChar(20) = 'Sql'
SELECT DATALENGTH(@CharName) CharSpaceUsed,
DATALENGTH(@VarCharName)
VarCharSpaceUsed
CharSpaceUsed VarCharSpaceUsed
------------- ----------------
20 3
Concatenation
of CHAR variables:
DECLARE @FirstName Char(20) = 'Sql',
@LastName Char(20) = 'Server'
IF @FirstName + ' ' + @LastName = 'Sql Server'
PRINT 'I was Expecting'
ELSE
PRINT 'Surprise to me ...'
SELECT @FirstName + ' ' + @LastName AS Name,
len(@FirstName + ' ' + @LastName) AS Length
Surprise to me ...
Name Length
-----------------------------------------
-----------
Sql Server 27
--Concatenation
of VARCHAR variables:
DECLARE @FirstName varchar(20) = 'Sql',
@LastName varchar(20) = 'Server'
IF @FirstName + ' ' + @LastName = 'Sql Server'
PRINT 'I was Expecting'
ELSE
PRINT 'Surprise to me ...'
SELECT @FirstName + ' ' + @LastName AS Name,
len(@FirstName + ' ' + @LastName) AS Length
I was Expecting
Name Length
-----------------------------------------
-----------
Sql Server 10
Remarks :
When n is not specified in a
data definition or variable declaration statement, the default length is 1.
When n is not specified when using the CAST and CONVERT functions, the default
length is 30.
declare
@var varchar
set @var = 'Hi'
select @var as result
result
------
H
declare
@var char
set @var = 'Hi'
select @var as result
result
------
H
Objects that use char or
varchar are assigned the default collation of the database, unless a specific
collation is assigned using the COLLATE clause. The collation controls the code
page that is used to store the character data.
If you have sites that
support multiple languages, consider using the Unicode nchar or nvarchar data
types to minimize character conversion issues.
If you use char or varchar,
we recommend the following:
- Use char when the sizes of the column data entries are
consistent.
- Use varchar when the sizes of the column data entries
vary considerably.
- Use varchar(max) when the sizes of the column data
entries vary considerably, and the size might exceed 8,000 bytes.
If SET ANSI_PADDING is OFF when either CREATE TABLE or
ALTER TABLE is executed, a char column that is defined as NULL is handled as
varchar.
When the collation code page
uses double-byte characters, the storage size is still n bytes. Depending on
the character string, the storage size of n bytes can be less than n
characters.
Converting Character Data:
When character expressions
are converted to a character data type of a different size, values that are too
long for the new data type are truncated.
The uniqueidentifier type is
considered a character type for the purposes of conversion from a character
expression, and therefore is subject to the truncation rules for converting to
a character type.
When a character expression
is converted to a character expression of a different data type or size, such
as from char(5) to varchar(5), or char(20) to char(15), the collation of the
input value is assigned to the converted value. If a noncharacter expression is
converted to a character data type, the default collation of the current
database is assigned to the converted value. In either case, you can assign a
specific collation by using the COLLATE clause.
Note: Code
page translations are supported for char and varchar data types, but not for
text data type. As with earlier versions of SQL Server, data loss during code
page translations is not reported.
Character expressions that
are being converted to an approximate numeric data type can include optional exponential
notation (a lowercase e or uppercase E followed by an optional plus (+) or
minus (-) sign and then a number).
Character expressions that
are being converted to an exact numeric data type must consist of digits, a
decimal point, and an optional plus (+) or minus (-). Leading blanks are
ignored. Comma separators, such as the thousands separator in 123,456.00, are
not allowed in the string.
declare
@var varchar(20)
set @var = '-123456.00'
select cast(@var as numeric(8,2)) as result
result
---------------------------------------
-123456.00
Character expressions being
converted to money or smallmoney data types can also include an optional
decimal point and dollar sign ($). Comma separators, as in $123,456.00, are
allowed.
declare
@var varchar(20)
set @var = '$123,456.00'
select cast(@var as money) as result
result
---------------------
123456.00
nchar and nvarchar (Transact-SQL):
Character data types that are
either fixed-length, nchar, or variable-length, nvarchar, Unicode data and use
the UNICODE UCS-2 character set.
Arguments
nchar [ ( n ) ]
Fixed-length Unicode string
data. n defines the string length and must be a value from 1 through 4,000. The
storage size is two times n bytes.
When the collation code page
uses double-byte characters, the storage size is still n bytes.
Depending on the string, the
storage size of n bytes can be less than the value specified for n. The ISO
synonyms for nchar are national char and national character..
nvarchar [ ( n | max ) ]
Variable-length Unicode
string data. n defines the string length and can be a value from 1 through
4,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The
storage size, in bytes, is two times the actual length of data entered + 2
bytes.
The ISO synonyms for nvarchar
are national char varying and national character varying.
Remarks
When n is not specified in a
data definition or variable declaration statement, the default length is 1.
When n is not specified with the CAST function, the default length is 30.
Use nchar when the sizes of
the column data entries are probably going to be similar.
Use nvarchar when the sizes
of the column data entries are probably going to vary considerably.
sysname is a system-supplied user-defined data type that
is functionally equivalent to nvarchar(128), except that it is not nullable.
sysname is used to reference database object names.
Objects that use nchar or
nvarchar are assigned the default collation of the database unless a specific
collation is assigned using the COLLATE clause.
SET ANSI_PADDING is always ON
for nchar and nvarchar. SET ANSI_PADDING OFF does not apply to the nchar or
nvarchar data types.
Prefix Unicode character
string constants with the letter N. Without the N prefix, the string is
converted to the default code page of the database. This default code page may
not recognize certain characters.
Uniqueidentifier :
Is a 16-byte GUID.
Remarks
A column or local variable of
uniqueidentifier data type can be initialized to a value in the following ways:
By using the NEWID function.
By converting from a string constant in the form
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, in which each x is a hexadecimal digit in
the range 0-9 or a-f.
For example,
6F9619FF-8B86-D011-B42D-00C04FC964FF is a valid uniqueidentifier value.
Comparison operators can be
used with uniqueidentifier values. However, ordering is not implemented by
comparing the bit patterns of the two values.
The only operations that can
be performed against a uniqueidentifier value are comparisons (=, <>,
<, >, <=, >=) and checking for NULL (IS NULL and IS NOT NULL). No
other arithmetic operators can be used. All column constraints and properties,
except IDENTITY, can be used on the uniqueidentifier data type.
Merge replication and
transactional replication with updating subscriptions use uniqueidentifier
columns to guarantee that rows are uniquely identified across multiple copies
of the table.
Examples
The following example converts a uniqueidentifier value
to a char data type.
DECLARE @myid uniqueidentifier = NEWID();
SELECT CONVERT(char(255), @myid) AS 'char';
The following example demonstrates the truncation of data when the value is too long for the data type being converted to. Because the uniqueidentifier type is limited to 36 characters, the characters that exceed that length are truncated.
DECLARE @ID nvarchar(max) = N'0E984725-C51C-4BF4-9960-E1C80E27ABA0wrong';
SELECT @ID, CONVERT(uniqueidentifier, @ID) AS TruncatedValue;
Here is the result set.
String TruncatedValue
-------------------------------------------- ------------------------------------
0E984725-C51C-4BF4-9960-E1C80E27ABA0wrong 0E984725-C51C-4BF4-9960-E1C80E27ABA0
(1 row(s) affected)
binary and varbinary (Transact-SQL):
Binary data types of either
fixed length or variable length.
Arguments
binary [ ( n ) ]
Fixed-length binary data with
a length of n bytes, where n is a value from 1 through 8,000. The storage size
is n bytes.
varbinary [ ( n | max) ]
Variable-length binary data.
n can be a value from 1 through 8,000. max indicates that the maximum storage
size is 2^31-1 bytes. The storage size is the actual length of the data entered
+ 2 bytes. The data that is entered can be 0 bytes in length. The ANSI SQL
synonym for varbinary is binary varying.
Remarks
When n is not specified in a
data definition or variable declaration statement, the default length is 1.
When n is not specified with the CAST function, the default length is 30.
Use binary when the sizes of
the column data entries are consistent.
Use varbinary when the sizes
of the column data entries vary considerably.
Use varbinary(max) when the
column data entries exceed 8,000 bytes.
Tips:
- Always specify the narrowest columns you can. The
narrower the column, the less amount of data SQL Server has to store, and
the faster SQL Server is able to read and write data. In addition, if any
sorts need to be performed on the column, the narrower the column, the
faster the sort will be
If a column is going to store values between 1 and 5, use tinyint instead
of int.
This allows you to store more rows in your data and index pages, reducing
the amount of I/O needed to read them.
It also reduces the amount of data moved from the server to the client,
reducing network traffic and latency. And last of all, it reduces the amount of
wasted space in your buffer cache.
- If you have a column that is designed to hold only
numbers, use a numeric data type, such as INTEGER, instead of a VARCHAR or
CHAR data type. Numeric data types generally require less space to hold
the same numeric value as does a character data type. This helps to reduce
the size of the columns, and can boost performance when the columns is
searched (WHERE clause), joined to another column, or sorted.
- Avoid using FLOAT or REAL data types for primary keys,
as they add unnecessary overhead that hurts performance. Use one of the
integer data types instead.
- When specifying data types during table creation, always
specify NULL or NOT NULL for each column. If you don’t, then the column
will default to NOT NULL if the ANSI NULL DEFAULT database option is not
selected (the default), and will default to NULL of the ANSI NULL DEFAULT
database option is selected.
- For best performance, and to reduce potential code bugs,
columns should ideally be set to NOT NULL. For example, use of the IS NULL
keywords in the WHERE clause makes that portion of the query non-sargeble,
which means that portion of the query cannot make good use an index.
- Generally, using computed columns in a table is not
recommended because it does not follow the standard rules of
normalization. But, it is sometimes more efficient overall to use computed
columns in a table rather than re-computing the same data repeatedly in
queries. This is especially true if you are running the same query over
and over against your data that performs the same calculations over and
over. By performing the calculations in the table, it can reduce the
amount of work performed by a query each time it is run. You have to
determine for yourself where the bottleneck in performance is, and act
accordingly. If the bottleneck is in INSERTS and UPDATES, then using
calculated columns may not be a good idea. But if your SELECT statements
are the bottleneck, then using calculated columns may pay off
- Avoid using the bigint data type unless you really need
its additional storage capacity. The bigint data type uses 8 bytes of
memory verses 4 bytes for the int data type
- Avoid using the SQL Server sql_variant datatype. Besides being a
performance hog, it significantly
affects what you can do with the data stored as a sql_variant. For
example, sql_variant columns cannot be a part of primary or foreign keys,
can be used in indexes and unique keys if they are shorter than 900 bytes,
cannot have an identity property, cannot be part of a computed column,
must convert the data to another datatype when moving data to objects with
other datatypes, are automatically converted to nvarchar(4000) when
accessed by client applications using the SQL Server 7.0 OLE DB or ODBC
providers, are not supported by the LIKE predicate in the WHERE clause,
cannot be concatenated, and don’t work with some functions.
- Avoid using date data types as a primary key. From a
performance perspective, it is more efficient to use a data type that uses
less space. For example, the DATETIME datatype uses 8 bytes of space,
while the INT datatype only takes up 4 bytes. The less space used, the
smaller the table and index, and the less I/O overhead that is required to
access the primary key.
- If you are creating a column that you know will be
subject to many sorts, consider making the column integer-based and not
character-based. This is because SQL Server can sort integer data faster
than character data.
Performance Considerations:
Optimizing data types means
choosing them carefully, as they have a huge impact on I/O, CPU and RAM
consumption. A larger data type takes up more room in the cache (RAM) resulting
in a smaller hit ratio on the cache, which causes more fetches from disk (more
I/O). This results in a heavier CPU hit, longer update times, increased lock
times and increased contention on the table, which will show up as blocks.
Indexes defined on larger data types take longer to scan/seek because more
pages have to be read, which will impact RAM, CPU and I/O. Larger data types
also increase maintenance times on the various indexes, yielding longer
"maintenance windows."
Performance Impacts of Disk Space Usage:
To discuss performance
impacts of disk space usage we need to review what it takes to retrieve and
write data to disk drives. Each time you need to read a piece of data from SQL
Server you need to retrieve the information from disk. This retrieval causes a
disk I/O. Data in SQL Server is stored in a number of different physical pages.
Each single page is 8060 bytes long. For every page of data, SQL Server
requires one I/O to retrieve that data.
To better understand how I/O
can impact performance let’s consider how many I/O’s it would take to retrieve
10,000,000 records from a SQL Server table. Say each record is 300 bytes long.
This means you can store 26 different records per page. The entire 10,000,000
records would require 384,616 data pages just to store the raw data; this
doesn’t take into account space for indexes. So to read every record in my
large 10 million record table it would take 384,616 I/O.
Now say I saved just 2 bytes
of data from every record, making each record 298 bytes long. This would mean
you could store 27 records per SQL Server page. With that 2-byte savings, you
now could retrieve 1 more record with each I/O operation. The total I/O savings
if you read the entire 10,000,000 record table would be 14,245 I/O’s. This is a
lot of I/O you would save for only 2 bytes of space saved per record.
So, each time you can save a
few bytes of data from each record stored in a SQL Server table you improve
your performance. The larger the table the bigger the performance gains you
will see. Therefore, you want to try to minimize your record size so you can
maximize the number of records that can be stored in each data page.
Now I/O is not your only
savings when you minimize the space it takes to store your data. Keep in mind
each page that is read first needs to be stored in the buffer pool. So the
smaller the record sizes the more records you can fit into a single buffer pool
page. Therefore, by conserving disk space for storing your data you are also
conserving the amount of memory you will need when reading your data.
Datetime VS SmallDatetime –
Usually when storing a
datetime value, everyone defaults to the datetime data type without considering
smalldatetime.
The datetime type is stored
as two ints accurate to 3.33 milliseconds with a range of Jan 1, 1753-Dec 31,
9999. However, for many database operations, this kind of range and precision
is not necessary.
The smalldatetime data type
is stored as one int, accurate to one minute with a range of Jan 1, 1900 – Jun
6, 2079.
Many datetime values can be
stored as a smalldatetime when the accuracy requirements do not have to be
precise to the sec/ms and the dates are generally limited to date ranges within
a few decade ranges. The benefits of using smalldatetime are apparent, since at
50% space saved, a table is much more efficient. Datetime data types should not
be used within a primary key unless frequent searches are done on date ranges.
Using Data Types to Minimize Disk Space Usage:
When selecting a data type
for a particular column you need to make sure you select the right data type.
It is easy to pick the wrong data type and waste some disk space. Therefore,
you need to be careful and make sure you select a data type that meets your
data requirements, and also minimizes the amount of disk space required to
store each data column. I’ll review different data types and discuss space
consideration for each.
First, let me talk about
Unicode date types. Unicode data types are NVARCHAR, NCHAR, and NTEXT. Unicode data
types require 2 bytes to store every character. Whereas non-Unicode date types
like VARCHAR, CHAR, and TEXT only take one byte to store each character.
The non-Unicode data types can only store 256 different
characters. With Unicode data types,
you can store up to 65,536 different 2 byte patterns. Because of the limitation
on the number of unique characters that can be stored using non-Unicode data
types, the hexadecimal representation for a particular character is not the
same across different code pages. When you use Unicode data types, the
character representation for commonly used characters are the same across code
pages.
Unicode data types are
typically used for international applications. If your application does not
need to be supported internationally then you should consider just using the
VARCHAR, CHAR, and TEXT data type, provided the characters your application
uses can be represented using the 1 byte – 256 character set. By using the
non-Unicode data types, you will use half the disk space for each
character-based column. If you store lots of character data then your disk
space consumption savings using non-Unicode characters could be substantial. I
converted one of our databases from using all Unicode data types to using all
non-Unicode data types and realized a 40% reduction in disk space consumption.
This kind of disk space savings can provide drastic performance gains over
using Unicode data types. This performance improvement is made by maximizing
the number of records that can be stored in a single SQL Server page.
Varchar/Char:
- If the text data in a column varies greatly in length,
use a VARCHAR data type instead of a CHAR data type. The amount of space
saved by using VARCHAR over CHAR on variable length columns can greatly
reduce I/O reads cache memory used to hold data, improving overall SQL
Server performance
- If a column’s data does not vary widely in length,
consider using a fixed-length CHAR field instead of a VARCHAR. While it
may take up a little more space to store the data, processing fixed-length
columns is faster in SQL Server than processing variable-length columns
- If you need to store large strings of data, and they are
less than 8,000 characters, use a VARCHAR data type instead of a TEXT data
type. TEXT data types have extra overhead that drag down performance
- Don’t use the NVARCHAR or NCHAR data types unless you
need to store 16-bit character (Unicode) data. They take up twice as much
space as VARCHAR or CHAR data types, increasing server I/O and wasting
unnecessary space in your buffer cache
- Another advantage of using VARCHAR over CHAR columns is
that sorts performed on VARCHAR columns are generally faster than on CHAR
columns. This is because the entire width of a CHAR column needs to be
sorted.
·
If
you are using fixed length columns (CHAR, NCHAR) in your table, consider
avoiding storing NULLs in them. If you do, the entire amount of space dedicated
to the column will be used up. For example, if you have a fixed length column
of 255 characters, and if you place a NULL in it, then 255 characters have to
be stored in the database. This is a large waste of space that will cause SQL
Server to have to perform extra disk I/O to read data pages. It also wastes
space in the data cache buffer. Both of these contribute to reduced SQL Server
performance.
Instead of using NULLs, use a coding
scheme similar to this in your databases:
NA: Not applicable
NYN: Not yet known
TUN: Truly unknown
Such a scheme provides the benefits of
using NULLs, but without the drawbacks.
If you really must use NULLs, use a
variable length column instead of a fixed length column. Variable length
columns only use a very small amount of space to store a NULL
- If you use the CONVERT function to convert a value to a
variable length datatype, such as VARCHAR, always specify the length of
the variable datatype. If you do not, SQL Server assumes a default length
of 30. Ideally, you should specify the shortest length to accomplish the
required task. This helps to reduce memory use and SQL Server resources
·
Take
care when using Unicode data in your queries, as it can affect query
performance. A classic problem is related to an application passing in Unicode
literals, while the column searched in the database table is non-Unicode. This,
of course, may be visa-versa depending on your scenario.
Here is an example. The DB column
“orgname_name” has been indexed, and is of type varchar. The code below
performs OK (so we think) performing an index scan operation:
declare @myvar nvarchar(200)
set @myvar = N’Central West College of
TAFE’
select * from
Organisation_Name where orgname_name = @myvar
|–Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([corpsys].[dbo].[Organisation_Name]))
|–Index
Scan(OBJECT:([corpsys].[dbo].[Organisation_Name].[Organisation_Name]),
WHERE:(Convert([Organisation_Name].[orgname_name])=[@myvar]))
Table ‘Organisation_Name’.
Scan count 1,
logical reads 1145,
physical reads 0,
read-ahead reads 0.
If we
change this around slightly, using a varchar variable instead (no explicit
Unicode conversion) we see this:
declare @myvar varchar(200)
set @myvar = ‘Central West College of
TAFE’
select * from
Organisation_Name where orgname_name = @myvarM
|–Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([corpsys].[dbo].[Organisation_Name]))
|–Index
Seek(OBJECT:([corpsys].[dbo].[Organisation_Name].[Organisation_Name_nameix]),
SEEK:([Organisation_Name].[orgname_name]=[@myvar]) ORDERED
FORWARD)
Here we see an INDEX SEEK lookup with a
massive performance improvement:
Table ‘Organisation_Name’.
Scan count 1,
logical reads 9,
physical reads 0,
read-ahead reads 0.
Instead of 1,145 logical reads, there is
only 9, a significant improvement
Bit vs. Char(1) –
One bit will take up 1 byte,
8 bits will still take up 1 byte, and a char(1) takes 1 byte. It could be
argued that using a char(1) to store Y/N is more readable, but it is far more
efficient to use a bit column if the table contains more than 1 bit column.
You'll also eliminate the possibility of someone trying to put an A/B/C/etc. in
the char(1) field, leading to incorrect data/functionality. Generally, it is
considered better practice to use the bit data type. Even if the table only
contains a 1 bit column, it will allow you to add additional bit columns in the
future. There will be no need to modify the data type/data on the existing
char(1) column in order to take advantage of the optimal data storage
configuration.
BIT vs Tinyint –
If you only have a single BIT
data type column in your table, then defining it as a BIT, or TINYINT uses the
same amount of disk storage space. This is because to store a single BIT data
type it requires 1 byte. Your space saving over TINYINT comes into play when
you have multiple BIT data type columns in a single table.
Integer –
Integer types consist of
tinyint (0-255, 1 byte), smallint (-32,768-32,767, 2 bytes), int
(–2,147,483,648-2,147,483,647, 4 bytes) and bigint
(–9,223,372,036,854,775,808-9,223,372,036,854,775,807, 8 bytes). These data
types are often used in primary keys and non-clustered indexes, so it's
important to pick the smallest data type available from the list.
It's tempting to just pick a
large data type in SQL Server, but it's better to plan for immediate needs in
the next 5 to10 years. You can always convert to a large data type later when
CPUs, I/O and RAM have more bandwidth to support larger types. Also, since this
data type is often used in primary keys (and consequently non-clustered index
keys), the smaller the data type, the less work SQL Server will have to do
during index maintenance.
UniqueIdentifier –
This data type consists of 16
byte storage, so it's very wide and is used to store GUIDs. It is commonly used
as a primary key due to its unique nature. It is generally easier to merge two
(or more) existing databases that have UniqueIdentifiers as primary keys as
opposed to int types. The latter would require every row to be adjusted for a
database merge.
With that in mind, be careful
with this data type, since often you will have a UniqueIdentifier as a primary
key along with several UniqueIdentifier foreign keys, making all the indexes
very large, resource intensive and much more difficult to maintain. The
increased row/index sizes also increase the overall load on the server.
UniqueIdentifiers make it harder for SQL Server to check for
"uniqueness" on inserts/updates, and that puts more pressure on the
server. Generally, for high-transaction OLTP systems, you are better off
avoiding the use of UniqueIdentifiers as primary keys.
When this data type is used
as a primary key, you should use the NewSequentialID() rather than the NewID()
function. That's because NewSequentialID always grabs a sequential ID,
resulting in a clustered index that is "ever-increasing," which
eliminates fragmentation on the clustered index. The NewID() function will
cause heavy fragmentation on a clustered index due to page splits.
Decimal –
In SQL Server 2005 SP2 and
greater (Enterprise, Developer and Evaluation Editions only), you can turn on
the vardecimal option (database properties/Options/VarDecimal Storage Format
Enabled). The vardecimal option varies the storage required for the decimal
data type, depending on the actual data stored in the column. You can only turn
it on at the database level rather than the table level.
In order to determine the
space that will be saved on a particular table, run the
sp_estimated_rowsize_reduction_for_vardecimal (located in master) system stored
procedure. No changes need to be made on the column to use vardecimal and it
will have no functional impact on code, since it is merely performance-related
functionality.
(n)Varchar(n), (n)Varchar(Max), VarBinary(n),
VarBinary(max), (n)Text, and Image-- When you plan to store large values, it is very important to pick the
correct data type. These data types have the potential to be an issue,
depending on the sizes chosen.
If you expect the text/data
size to be fairly small, then it is recommended that you store the data on the
data row, meaning you would want to use (n)Varchar(n) or VarBinary(n), which
makes it subject to the 8k limit.
If the data is potentially
large, then (n)Varchar(Max) or VarBinary(max) should be used, as the
VarBinary(max), (n)Varchar(Max), (n)Text, and Image data types can be stored
"out of row."
Storing large object data
"out of row" means that the regular data row merely contains a
pointer to another data row that stores the large object data, thus eliminating
the 8k limit for the large object and resulting in smaller data row sizes. If a
query does not require/return the large object, then it just scans/seeks on the
smaller data row returning results much faster. It's able to transverse fewer
data pages than if the large object was stored "in row."
If a query does
require/return the large object, then it scans/seeks for the smaller data row
and uses the pointer on that row to return the large object. You are better off
not using the (n)Text or Image data types, replacing them with (n)Varchar(Max)
and VarBinary(max), respectively.
If you choose to use the
VarBinary(max), (n)Varchar(Max), (n)Text, or Image data type then it is best if
the option to store the value "out of row" is chosen. This setting is
controlled by calling the system stored procedure sp_tableoption N'MyTable',
'large value types out of row', 'ON'.
The size of the data rows and
indexes impacts a wide range of hardware resources. SQL Server performance will
increase exponentially as you reduce the size of the rows and indexes. A few fast
points to remember: Always pick the smallest data type, use bits and integers
(the smaller the int type the better) with lookup tables where possible –
trying to avoid the use of (n)varchars.
If you have to use large
text/binary values, then consider defining them as (Max) types and setting the
options to store the data "out of row." If the text data is fixed
length, then always use (n)char(n) data types so 2 bytes is not wasted storing
offsets. If your primary concern is performance, avoid using any data types
other than ints for primary/foreign keys, as this will significantly reduce
your data row and index sizes. SQL Server data types seem like a small issue at
design time, but they will have a huge impact on your system as it grows
organically over time.
Datetime Vs SmallDateTime:
The DATETIME data type is
another commonly misused data type that wastes space in your database. There
are two different date data types: SMALLDATETIME, and DATETIME. Like the other
data types, these two different date types require a different number of bytes
to store their value.
The SMALLDATETIME data type
requires 4 bytes of storage, and stores dates from January 1, 1900 through June
6, 2079 and the time portion is only accurate to the minutes.
The DATETIME data type takes
8 bytes to store its value and supports dates from January 1, 1753 through
December 31, 9999 and the time portion is accurate to 3 milliseconds. If your
application only needs to store the current date or some future date that does
not to go beyond the year 2079, and you don’t need a time portion then the
SMALLDATETIME column data type is the correct data type to use.
Most applications that deal
with dates can save space by using the SMALLDATETIME data type, provided the
application doesn’t need a time precision less than one minute. Anytime your
application needs to store time down to the second, or have dates prior to
1900, or beyond 2079 then DATETIME data type must be used.
MONEY Vs SMALLMONEY:
Some applications use MONEY
and SMALLMONEY data types to store currency amounts. Just like the other data
types MONEY and SMALLMONEY take different amounts of space to store their data
values. SMALLMONEY only takes 4 bytes and supports values from -214,748.3538 to
214,748.3647, whereas the MONEY data type takes 8 bytes and stores values from
-922,337,203,685,477.5808 to 922,337,203,685,477.5807. If your application is
storing monetary transactions that will never exceed a little over 200,000 then
you will save 4 bytes for each transaction amount by using the SMALLMONEY data
type.
Interview Questions:
- How to convert text to integer in SQL?
If table column is VARCHAR and has all the numeric values in it, it can be retrieved as Integer using CAST or CONVERT function.
- Will CAST or CONVERT thrown an error when column
values converted from alpha-numeric characters to numeric?
YES.
- Will CAST
or CONVERT retrieve only numbers when column values converted from
alpha-numeric characters to numeric?
NO.
No comments:
Post a Comment