Pages

Tuesday, 23 June 2015

Data Types

Exact Numerics:
int, bigint, smallint, and tinyint

Exact number data types that use integer data.
bigint
·         Integer (whole number) data from -2^63 (-9,223,372,036,854,775,808) through 2^63-1 (9,223,372,036,854,775,807). Storage size is 8 bytes.

int
·         Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647). Storage size is 4 bytes. The SQL-92 synonym for int is integer.

smallint
·         Integer data from -2^15 (-32,768) through 2^15 - 1 (32,767). Storage size is 2 bytes.

tinyint
·         Integer data from 0 through 255. Storage size is 1 byte.

Remarks
·         The bigint data type is supported where integer values are supported. However, bigint is intended for special cases where the integer values may exceed the range supported by the int data type. The int data type remains the primary integer data type in SQL Server.

·         bigint fits between smallmoney and int in the data type precedence chart.

·         Functions will return bigint only if the parameter expression is a bigint data type. SQL Server will not automatically promote other integer data types (tinyint, smallint, and int) to bigint.


create table test(id1  tinyint,id2 smallint,id3 int,id4 bigint)

   Description of Table is: sp_help ‘test’

Column_name  Type     Length Prec Scale
id1           tinyint   1    3    0
id2          smallint   2    5    0
id3          int        4    10   0
id4          bigint     8         


Length: The number of Bytes allocated for column.
        Tinyint   –  1 byte
        Smallint  -  2 byte
         Int       -  4 byte
         bigInt    -  8 byte

Precissioc:The number of integer positions (left to decimal point)

        Tinyint   –  1 byte    2 power 7  =128        so 3 positions
        Smallint  -  2 byte    2 power 15 =32768      so 5 positions
         Int       -  4 byte    2 power 31 =2147483648 so 10 positions
         bigInt    -  8 byte   

Note: For Bigint data type both precission and scale are showing as empty,because in sp_help stored procedure logic they didn’t include the bigint data type while caluclating this precission and scale values.

If we query the syscolumns then for the object then will get perfect details about all columns.


Select * from syscolumns where id=(select id from sysobjects where name=’test’)

1.   When integers are implicitly converted to a character data type, if the integer is too large to fit into the character field, SQL Server enters ASCII character 42, the asterisk (*).

declare
@var char(2)
set @var=152
select @var as result

result
------
*

(1 row(s) affected)


2.   Integer constants greater than 2,147,483,647 are converted to the decimal data type, not the bigint data type.The following example shows that when the threshold value is exceeded, the data type of the result changes from an int to a decimal.
SELECT 2147483647 / 2 AS Result1, 2147483649 / 2 AS Result2 ;

Here is the result set.

Result1      Result2
1073741823   1073741824.500000

   The same even the integer constant is smaller than -2^31 (-2,147,483,648)

SELECT  -2147483650 / 2 AS Result2 ;

Result2
---------------------------------------
-1073741825.000000


3.    From SQL authority: Pinal,I am migrating my database from MySQL to SQL Server and I have faced unique situation.

I have been using Unsigned 64-bit integer in MySQL but when I try to migrate that column to SQL Server, I am facing an issue as there is no datatype which I find appropriate for my column. It is now too late to change the datatype and I need immediate solution.

One chain of thought was to change the data type of the column from Unsigned 64-bit (BIGINT) to VARCHAR(n) but that will just change the data type for me such that I will face quite a lot of performance related issues in future.

In SQL Server we also have the BIGINT data type but that is Signed 64-bit datatype. BIGINT datatype in SQL Server have range of -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807). However, my digit is much larger than this number.

Is there anyway, I can store my big 64-bit Unsigned Integer without loosing much of the performance of by converting it to VARCHAR.”

Very interesting question, for the sake of the argument, we can ask user that there should be no need of such a big number or if you are taking about identity column I really doubt that if your table will grow beyond this table. Here the real question which I found interesting was how to store 64-bit unsigned integer value in SQL Server without converting it to String data type. After thinking a bit, I found a fairly simple answer.


I can use NUMERIC data type.

I can use NUMERIC(20) datatype for 64-bit unsigned integer value, NUMERIC(10) datatype for 32-bit unsigned integer value and NUMERIC(5) datatype for 16-bit unsigned integer value. Numeric datatype supports 38 maximum of 38 precision.

Now here is another thing to keep in mind.

·         Using NUMERIC datatype will indeed accept the 64-bit unsigned integer but in future if you try to enter negative value, it will also allow the same. Hence, you will need to put any additional constraint over column to only accept positive integer there.

·         Here is another big concern, SQL Server will store the number as numeric and will treat that as a positive integer for all the practical purpose. You will have to write in your application logic to interpret that as a 64-bit Unsigned Integer. On another side if you are using unsigned integers in your application, there are good chance that you already have logic taking care of the same.


Benefits example:
Choosing the sql data type tinyint instead of int for a "ProductType" column with values ranging from 1 to 10 will save three bytes per record. With 100,000 records you will save 300,000 bytes. That's not much in terms of disc space ("storage is cheap, etc") but you'll probably have indexes containing that column and if that index takes less memory the database engine will process that index much more efficient in every "join" and "where" etc.

So, queries will perform faster,release locks earlier (if any) and use less system resources (memory and CPU).This will make the whole server perform better as there will be more resources available for other things.

Once learned the sql data types available and spending a few extra minutes when designing your schema will result in faster query execution and an overall better performing database.

When you use the +, -, *, /, or % arithmetic operators to perform implicit or explicit conversion of int, smallint, tinyint, or bigint constant values to the float, real, decimal or numeric data types, the rules that SQL Server applies when it calculates the data type and precision of the expression results differ depending on whether the query is autoparameterized or not.


Therefore, similar expressions in queries can sometimes produce different results. When a query is not autoparameterized, the constant value is first converted to numeric, whose precision is just large enough to hold the value of the constant, before converting to the specified data type. For example, the constant value 1 is converted to numeric (1, 0), and the constant value 250 is converted to numeric (3, 0).

When a query is autoparameterized, the constant value is always converted to numeric (10, 0) before converting to the final data type. When the / operator is involved, not only can the result type's precision differ among similar queries, but the result value can differ also.

For example, the result value of an autoparameterized query that includes the expression SELECT CAST (1.0 / 7 AS float) will differ from the result value of the same query that is not autoparameterized, because the results of the autoparameterized query will be truncated to fit into the numeric (10, 0) data type.




Decimal and Numeric (Transact-SQL):

Numeric data types that have fixed precision and scale.

Arguments:

decimal [ (p[ ,s] )] and numeric[ (p[ ,s] )]

Fixed precision and scale numbers. When maximum precision is used, valid values are from - 10^38 +1 through 10^38 - 1. The ISO synonyms for decimal are dec and dec(p, s).

numeric is functionally equivalent to decimal.

p (precision)
The maximum total number of decimal digits that will be stored, both to the left and to the right of the decimal point. The precision must be a value from 1 through the maximum precision of 38. The default precision is 18.

s (scale)
The number of decimal digits that will be stored to the right of the decimal point.

This number is substracted from p to determine the maximum number of digits to the left of the decimal point.

The maximum number of decimal digits that can be stored to the right of the decimal point.

Scale must be a value from 0 through p. Scale can be specified only if precision is specified.

The default scale is 0; therefore, 0 <= s <= p. Maximum storage sizes vary, based on the precision.

Precision
Storage bytes
1 - 9
5
10-19
9
20-28
13
29-38
17

Converting decimal and numeric Data
For the decimal and numeric data types, SQL Server considers each specific combination of precision and scale as a different data type. For example, decimal(5,5) and decimal(5,0) are considered different data types.

In Transact-SQL statements, a constant with a decimal point is automatically converted into a numeric data value, using the minimum precision and scale necessary. For example, the constant 12.345 is converted into a numeric value with a precision of 5 and a scale of 3.

Converting from decimal or numeric to float or real can cause some loss of precision.
Converting from int, smallint, tinyint, float, real, money, or smallmoney to either decimal or numeric can cause overflow.

By default, SQL Server uses rounding when converting a number to a decimal or numeric value with a lower precision and scale. However, if the SET ARITHABORT option is ON, SQL Server raises an error when overflow occurs. Loss of only precision and scale is not sufficient to raise an error.

When converting float or real values to decimal or numeric, the decimal value will never have more than 17 decimals. Any float value < 5E-18 will always convert as 0.



bit (Transact-SQL):

An integer data type that can take a value of 1, 0, or NULL.

Remarks:

The SQL Server Database Engine optimizes storage of bit columns. If there are 8 or less bit columns in a table, the columns are stored as 1 byte. If there are from 9 up to 16 bit columns, the columns are stored as 2 bytes, and so on.

The string values TRUE and FALSE can be converted to bit values: TRUE is converted to 1 and FALSE is converted to 0.

Converting to bit promotes any nonzero value to 1.




money and smallmoney (Transact-SQL):

Data types that represent monetary or currency values.

Data type
Range
Storage
money
-922,337,203,685,477.5808 to 922,337,203,685,477.5807
8 bytes
smallmoney
- 214,748.3648 to 214,748.3647
4 bytes

The money and smallmoney data types are accurate to a ten-thousandth of the monetary units that they represent.

Use a period to separate partial monetary units, like cents, from whole monetary units. For example, 2.15 specifies 2 dollars and 15 cents.

Currency or monetary data does not need to be enclosed in single quotation marks
( ' ). It is important to remember that while you can specify monetary values preceded by a currency symbol, SQL Server does not store any currency information associated with the symbol, it only stores the numeric value.








Approximate Numerics:

float and real (Transact-SQL)

Approximate-number data types for use with floating point numeric data. Floating point data is approximate; therefore, not all values in the data type range can be represented exactly. The ISO synonym for real is float(24)

Syntax:
float [ (n) ]
Where n is the number of bits that are used to store the mantissa of the float number in scientific notation and, therefore, dictates the precision and storage size. If n is specified, it must be a value between 1 and 53. The default value of n is 53.

nvalue
Precision
Storage size
1-24
7 digits
4 bytes
25-53
15 digits
8 bytes

The SQL Server float[(n)] data type complies with the ISO standard for all values of n from 1 through 53. The synonym for double precision is float(53).

Remarks:

Data type
Range
Storage
float
- 1.79E+308 to -2.23E-308, 0 and 2.23E-308 to 1.79E+308
Depends on the value of n
real
- 3.40E + 38 to -1.18E - 38, 0 and 1.18E - 38 to 3.40E + 38
4 Bytes

Converting float and real Data

Values of float are truncated when they are converted to any integer type.
When you want to convert from float or real to character data, using the STR string function is usually more useful than CAST( ).

This is because STR enables more control over formatting. For more information, see STR (Transact-SQL) and Built-in Functions (Transact-SQL).

Conversion of float values that use scientific notation to decimal or numeric is restricted to values of precision 17 digits only. Any value with precision higher than 17 rounds to zero.













Char and Varchar (Transact-SQL):

Are string data types of either fixed length or variable length.

Arguments:

char [ ( n ) ]
Fixed-length, non-Unicode string data. n defines the string length and must be a value from 1 through 8,000. The storage size is n bytes. The ISO synonym for char is character.

varchar [ ( n | max ) ]
Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes. The ISO synonyms for varchar are char varying or character varying.

CHAR Data Type is a Fixed Length Data Type. For example, if you declare a variable/column of CHAR (10) data type, then it will always take 10 bytes irrespective of whether you are storing 1 character or 10 character in this variable or column. And in this example, as we have declared this variable/column as CHAR(10), so we can store max 10 characters in this column.

On the other hand, VARCHAR is a variable length Data Type. For example, if you declare a variable/column of VARCHAR (10) data type, it will take the number of bytes equal to the number of characters stored in this column. So, in this variable/column, if you are storing only one character, then it will take only one byte and if we are storing 10 characters, then it will take 10 bytes. And in this example, as we have declared this variable/column as VARCHAR (10), so we can store max 10 characters in this column.

DECLARE @CharName Char(20) = 'Sql',
 @VarCharName VarChar(20) = 'Sql'
SELECT DATALENGTH(@CharName) CharSpaceUsed,
 DATALENGTH(@VarCharName) VarCharSpaceUsed

CharSpaceUsed VarCharSpaceUsed
------------- ----------------
20            3


Concatenation of CHAR variables:

DECLARE @FirstName Char(20) = 'Sql',
        @LastName Char(20) = 'Server'

IF @FirstName + ' ' + @LastName = 'Sql Server'
 PRINT 'I was Expecting'
ELSE
 PRINT 'Surprise to me ...'

SELECT  @FirstName + ' ' + @LastName AS Name,
 len(@FirstName + ' ' + @LastName) AS Length


Surprise to me ...
Name                                      Length
----------------------------------------- -----------
Sql                  Server               27



--Concatenation of VARCHAR variables:

DECLARE @FirstName varchar(20) = 'Sql',
        @LastName varchar(20) = 'Server'

IF @FirstName + ' ' + @LastName = 'Sql Server'
 PRINT 'I was Expecting'
ELSE
 PRINT 'Surprise to me ...'

SELECT  @FirstName + ' ' + @LastName AS Name,
 len(@FirstName + ' ' + @LastName) AS Length


I was Expecting
Name                                      Length
----------------------------------------- -----------
Sql Server                                10


Remarks :

When n is not specified in a data definition or variable declaration statement, the default length is 1. When n is not specified when using the CAST and CONVERT functions, the default length is 30.

declare
@var varchar
set @var = 'Hi'
select @var as result

result
------
H

declare
@var char
set @var = 'Hi'
select @var as result

result
------
H

Objects that use char or varchar are assigned the default collation of the database, unless a specific collation is assigned using the COLLATE clause. The collation controls the code page that is used to store the character data.

If you have sites that support multiple languages, consider using the Unicode nchar or nvarchar data types to minimize character conversion issues.

If you use char or varchar, we recommend the following:
  • Use char when the sizes of the column data entries are consistent.
  • Use varchar when the sizes of the column data entries vary considerably.
  • Use varchar(max) when the sizes of the column data entries vary considerably, and the size might exceed 8,000 bytes.

If SET ANSI_PADDING is OFF when either CREATE TABLE or ALTER TABLE is executed, a char column that is defined as NULL is handled as varchar.

When the collation code page uses double-byte characters, the storage size is still n bytes. Depending on the character string, the storage size of n bytes can be less than n characters.

Converting Character Data:

When character expressions are converted to a character data type of a different size, values that are too long for the new data type are truncated.

The uniqueidentifier type is considered a character type for the purposes of conversion from a character expression, and therefore is subject to the truncation rules for converting to a character type.

When a character expression is converted to a character expression of a different data type or size, such as from char(5) to varchar(5), or char(20) to char(15), the collation of the input value is assigned to the converted value. If a noncharacter expression is converted to a character data type, the default collation of the current database is assigned to the converted value. In either case, you can assign a specific collation by using the COLLATE clause.

Note: Code page translations are supported for char and varchar data types, but not for text data type. As with earlier versions of SQL Server, data loss during code page translations is not reported.

Character expressions that are being converted to an approximate numeric data type can include optional exponential notation (a lowercase e or uppercase E followed by an optional plus (+) or minus (-) sign and then a number).

Character expressions that are being converted to an exact numeric data type must consist of digits, a decimal point, and an optional plus (+) or minus (-). Leading blanks are ignored. Comma separators, such as the thousands separator in 123,456.00, are not allowed in the string.

            declare
            @var varchar(20)
            set @var = '-123456.00'
            select cast(@var as numeric(8,2)) as result
           
result
---------------------------------------
-123456.00

Character expressions being converted to money or smallmoney data types can also include an optional decimal point and dollar sign ($). Comma separators, as in $123,456.00, are allowed.

            declare
            @var varchar(20)
            set @var = '$123,456.00'
            select cast(@var as money) as result
           
result
---------------------
123456.00








nchar and nvarchar (Transact-SQL):

Character data types that are either fixed-length, nchar, or variable-length, nvarchar, Unicode data and use the UNICODE UCS-2 character set.

Arguments

nchar [ ( n ) ]
Fixed-length Unicode string data. n defines the string length and must be a value from 1 through 4,000. The storage size is two times n bytes.

When the collation code page uses double-byte characters, the storage size is still n bytes.

Depending on the string, the storage size of n bytes can be less than the value specified for n. The ISO synonyms for nchar are national char and national character..

nvarchar [ ( n | max ) ]
Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size, in bytes, is two times the actual length of data entered + 2 bytes.

The ISO synonyms for nvarchar are national char varying and national character varying.

Remarks
When n is not specified in a data definition or variable declaration statement, the default length is 1. When n is not specified with the CAST function, the default length is 30.

Use nchar when the sizes of the column data entries are probably going to be similar.

Use nvarchar when the sizes of the column data entries are probably going to vary considerably.

sysname is a system-supplied user-defined data type that is functionally equivalent to nvarchar(128), except that it is not nullable. sysname is used to reference database object names.

Objects that use nchar or nvarchar are assigned the default collation of the database unless a specific collation is assigned using the COLLATE clause.
SET ANSI_PADDING is always ON for nchar and nvarchar. SET ANSI_PADDING OFF does not apply to the nchar or nvarchar data types.

Prefix Unicode character string constants with the letter N. Without the N prefix, the string is converted to the default code page of the database. This default code page may not recognize certain characters.



Uniqueidentifier :

Is a 16-byte GUID.

Remarks

A column or local variable of uniqueidentifier data type can be initialized to a value in the following ways:

By using the NEWID function.

By converting from a string constant in the form xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, in which each x is a hexadecimal digit in the range 0-9 or a-f.

For example, 6F9619FF-8B86-D011-B42D-00C04FC964FF is a valid uniqueidentifier value.
Comparison operators can be used with uniqueidentifier values. However, ordering is not implemented by comparing the bit patterns of the two values.

The only operations that can be performed against a uniqueidentifier value are comparisons (=, <>, <, >, <=, >=) and checking for NULL (IS NULL and IS NOT NULL). No other arithmetic operators can be used. All column constraints and properties, except IDENTITY, can be used on the uniqueidentifier data type.

Merge replication and transactional replication with updating subscriptions use uniqueidentifier columns to guarantee that rows are uniquely identified across multiple copies of the table.

Examples

The following example converts a uniqueidentifier value to a char data type.

DECLARE @myid uniqueidentifier = NEWID();
SELECT CONVERT(char(255), @myid) AS 'char';
 
The following example demonstrates the truncation of data when the value is too long for the data type being converted to. Because the uniqueidentifier type is limited to 36 characters, the characters that exceed that length are truncated.

DECLARE @ID nvarchar(max) = N'0E984725-C51C-4BF4-9960-E1C80E27ABA0wrong';
SELECT @ID, CONVERT(uniqueidentifier, @ID) AS TruncatedValue;

Here is the result set.

String                                       TruncatedValue
-------------------------------------------- ------------------------------------
0E984725-C51C-4BF4-9960-E1C80E27ABA0wrong    0E984725-C51C-4BF4-9960-E1C80E27ABA0
 
(1 row(s) affected)



binary and varbinary (Transact-SQL):

Binary data types of either fixed length or variable length.

Arguments
binary [ ( n ) ]
Fixed-length binary data with a length of n bytes, where n is a value from 1 through 8,000. The storage size is n bytes.

varbinary [ ( n | max) ]
Variable-length binary data. n can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes. The storage size is the actual length of the data entered + 2 bytes. The data that is entered can be 0 bytes in length. The ANSI SQL synonym for varbinary is binary varying.

Remarks
When n is not specified in a data definition or variable declaration statement, the default length is 1. When n is not specified with the CAST function, the default length is 30.

Use binary when the sizes of the column data entries are consistent.
Use varbinary when the sizes of the column data entries vary considerably.
Use varbinary(max) when the column data entries exceed 8,000 bytes.



Tips:

  • Always specify the narrowest columns you can. The narrower the column, the less amount of data SQL Server has to store, and the faster SQL Server is able to read and write data. In addition, if any sorts need to be performed on the column, the narrower the column, the faster the sort will be

If a column is going to store values between 1 and 5, use tinyint instead of int.

This allows you to store more rows in your data and index pages, reducing the amount of I/O needed to read them.

It also reduces the amount of data moved from the server to the client, reducing network traffic and latency. And last of all, it reduces the amount of wasted space in your buffer cache.

  • If you have a column that is designed to hold only numbers, use a numeric data type, such as INTEGER, instead of a VARCHAR or CHAR data type. Numeric data types generally require less space to hold the same numeric value as does a character data type. This helps to reduce the size of the columns, and can boost performance when the columns is searched (WHERE clause), joined to another column, or sorted.

  • Avoid using FLOAT or REAL data types for primary keys, as they add unnecessary overhead that hurts performance. Use one of the integer data types instead.

  • When specifying data types during table creation, always specify NULL or NOT NULL for each column. If you don’t, then the column will default to NOT NULL if the ANSI NULL DEFAULT database option is not selected (the default), and will default to NULL of the ANSI NULL DEFAULT database option is selected.

  • For best performance, and to reduce potential code bugs, columns should ideally be set to NOT NULL. For example, use of the IS NULL keywords in the WHERE clause makes that portion of the query non-sargeble, which means that portion of the query cannot make good use an index.

  • Generally, using computed columns in a table is not recommended because it does not follow the standard rules of normalization. But, it is sometimes more efficient overall to use computed columns in a table rather than re-computing the same data repeatedly in queries. This is especially true if you are running the same query over and over against your data that performs the same calculations over and over. By performing the calculations in the table, it can reduce the amount of work performed by a query each time it is run. You have to determine for yourself where the bottleneck in performance is, and act accordingly. If the bottleneck is in INSERTS and UPDATES, then using calculated columns may not be a good idea. But if your SELECT statements are the bottleneck, then using calculated columns may pay off

  • Avoid using the bigint data type unless you really need its additional storage capacity. The bigint data type uses 8 bytes of memory verses 4 bytes for the int data type

  • Avoid using the SQL Server sql_variant datatype. Besides being a performance hog, it significantly affects what you can do with the data stored as a sql_variant. For example, sql_variant columns cannot be a part of primary or foreign keys, can be used in indexes and unique keys if they are shorter than 900 bytes, cannot have an identity property, cannot be part of a computed column, must convert the data to another datatype when moving data to objects with other datatypes, are automatically converted to nvarchar(4000) when accessed by client applications using the SQL Server 7.0 OLE DB or ODBC providers, are not supported by the LIKE predicate in the WHERE clause, cannot be concatenated, and don’t work with some functions.

  • Avoid using date data types as a primary key. From a performance perspective, it is more efficient to use a data type that uses less space. For example, the DATETIME datatype uses 8 bytes of space, while the INT datatype only takes up 4 bytes. The less space used, the smaller the table and index, and the less I/O overhead that is required to access the primary key. 

  • If you are creating a column that you know will be subject to many sorts, consider making the column integer-based and not character-based. This is because SQL Server can sort integer data faster than character data. 







Performance Considerations:

Optimizing data types means choosing them carefully, as they have a huge impact on I/O, CPU and RAM consumption. A larger data type takes up more room in the cache (RAM) resulting in a smaller hit ratio on the cache, which causes more fetches from disk (more I/O). This results in a heavier CPU hit, longer update times, increased lock times and increased contention on the table, which will show up as blocks. Indexes defined on larger data types take longer to scan/seek because more pages have to be read, which will impact RAM, CPU and I/O. Larger data types also increase maintenance times on the various indexes, yielding longer "maintenance windows."

Performance Impacts of Disk Space Usage:

To discuss performance impacts of disk space usage we need to review what it takes to retrieve and write data to disk drives. Each time you need to read a piece of data from SQL Server you need to retrieve the information from disk. This retrieval causes a disk I/O. Data in SQL Server is stored in a number of different physical pages. Each single page is 8060 bytes long. For every page of data, SQL Server requires one I/O to retrieve that data.

To better understand how I/O can impact performance let’s consider how many I/O’s it would take to retrieve 10,000,000 records from a SQL Server table. Say each record is 300 bytes long. This means you can store 26 different records per page. The entire 10,000,000 records would require 384,616 data pages just to store the raw data; this doesn’t take into account space for indexes. So to read every record in my large 10 million record table it would take 384,616 I/O.

Now say I saved just 2 bytes of data from every record, making each record 298 bytes long. This would mean you could store 27 records per SQL Server page. With that 2-byte savings, you now could retrieve 1 more record with each I/O operation. The total I/O savings if you read the entire 10,000,000 record table would be 14,245 I/O’s. This is a lot of I/O you would save for only 2 bytes of space saved per record.

So, each time you can save a few bytes of data from each record stored in a SQL Server table you improve your performance. The larger the table the bigger the performance gains you will see. Therefore, you want to try to minimize your record size so you can maximize the number of records that can be stored in each data page.

Now I/O is not your only savings when you minimize the space it takes to store your data. Keep in mind each page that is read first needs to be stored in the buffer pool. So the smaller the record sizes the more records you can fit into a single buffer pool page. Therefore, by conserving disk space for storing your data you are also conserving the amount of memory you will need when reading your data.
Datetime VS SmallDatetime –
Usually when storing a datetime value, everyone defaults to the datetime data type without considering smalldatetime.

The datetime type is stored as two ints accurate to 3.33 milliseconds with a range of Jan 1, 1753-Dec 31, 9999. However, for many database operations, this kind of range and precision is not necessary.

The smalldatetime data type is stored as one int, accurate to one minute with a range of Jan 1, 1900 – Jun 6, 2079.

Many datetime values can be stored as a smalldatetime when the accuracy requirements do not have to be precise to the sec/ms and the dates are generally limited to date ranges within a few decade ranges. The benefits of using smalldatetime are apparent, since at 50% space saved, a table is much more efficient. Datetime data types should not be used within a primary key unless frequent searches are done on date ranges.

Using Data Types to Minimize Disk Space Usage:

When selecting a data type for a particular column you need to make sure you select the right data type. It is easy to pick the wrong data type and waste some disk space. Therefore, you need to be careful and make sure you select a data type that meets your data requirements, and also minimizes the amount of disk space required to store each data column. I’ll review different data types and discuss space consideration for each.

First, let me talk about Unicode date types. Unicode data types are NVARCHAR, NCHAR, and NTEXT. Unicode data types require 2 bytes to store every character. Whereas non-Unicode date types like VARCHAR, CHAR, and TEXT only take one byte to store each character.

The non-Unicode data types can only store 256 different characters. With Unicode data types, you can store up to 65,536 different 2 byte patterns. Because of the limitation on the number of unique characters that can be stored using non-Unicode data types, the hexadecimal representation for a particular character is not the same across different code pages. When you use Unicode data types, the character representation for commonly used characters are the same across code pages.

Unicode data types are typically used for international applications. If your application does not need to be supported internationally then you should consider just using the VARCHAR, CHAR, and TEXT data type, provided the characters your application uses can be represented using the 1 byte – 256 character set. By using the non-Unicode data types, you will use half the disk space for each character-based column. If you store lots of character data then your disk space consumption savings using non-Unicode characters could be substantial. I converted one of our databases from using all Unicode data types to using all non-Unicode data types and realized a 40% reduction in disk space consumption. This kind of disk space savings can provide drastic performance gains over using Unicode data types. This performance improvement is made by maximizing the number of records that can be stored in a single SQL Server page.

Varchar/Char:

  • If the text data in a column varies greatly in length, use a VARCHAR data type instead of a CHAR data type. The amount of space saved by using VARCHAR over CHAR on variable length columns can greatly reduce I/O reads cache memory used to hold data, improving overall SQL Server performance

  • If a column’s data does not vary widely in length, consider using a fixed-length CHAR field instead of a VARCHAR. While it may take up a little more space to store the data, processing fixed-length columns is faster in SQL Server than processing variable-length columns

  • If you need to store large strings of data, and they are less than 8,000 characters, use a VARCHAR data type instead of a TEXT data type. TEXT data types have extra overhead that drag down performance

  • Don’t use the NVARCHAR or NCHAR data types unless you need to store 16-bit character (Unicode) data. They take up twice as much space as VARCHAR or CHAR data types, increasing server I/O and wasting unnecessary space in your buffer cache

  • Another advantage of using VARCHAR over CHAR columns is that sorts performed on VARCHAR columns are generally faster than on CHAR columns. This is because the entire width of a CHAR column needs to be sorted. 

·         If you are using fixed length columns (CHAR, NCHAR) in your table, consider avoiding storing NULLs in them. If you do, the entire amount of space dedicated to the column will be used up. For example, if you have a fixed length column of 255 characters, and if you place a NULL in it, then 255 characters have to be stored in the database. This is a large waste of space that will cause SQL Server to have to perform extra disk I/O to read data pages. It also wastes space in the data cache buffer. Both of these contribute to reduced SQL Server performance.
Instead of using NULLs, use a coding scheme similar to this in your databases:
NA: Not applicable
NYN: Not yet known
TUN: Truly unknown
Such a scheme provides the benefits of using NULLs, but without the drawbacks.
If you really must use NULLs, use a variable length column instead of a fixed length column. Variable length columns only use a very small amount of space to store a NULL
  • If you use the CONVERT function to convert a value to a variable length datatype, such as VARCHAR, always specify the length of the variable datatype. If you do not, SQL Server assumes a default length of 30. Ideally, you should specify the shortest length to accomplish the required task. This helps to reduce memory use and SQL Server resources

·         Take care when using Unicode data in your queries, as it can affect query performance. A classic problem is related to an application passing in Unicode literals, while the column searched in the database table is non-Unicode. This, of course, may be visa-versa depending on your scenario.
Here is an example. The DB column “orgname_name” has been indexed, and is of type varchar. The code below performs OK (so we think) performing an index scan operation:
declare @myvar nvarchar(200)
set @myvar = N’Central West College of TAFE’
select * from Organisation_Name where orgname_name = @myvar

|–Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([corpsys].[dbo].[Organisation_Name]))
|Index Scan(OBJECT:([corpsys].[dbo].[Organisation_Name].[Organisation_Name]),
WHERE:(Convert([Organisation_Name].[orgname_name])=[@myvar]))
Table ‘Organisation_Name’.

Scan count 1,
logical reads 1145,
physical reads 0,
read-ahead reads 0.

If we change this around slightly, using a varchar variable instead (no explicit Unicode conversion) we see this:
declare @myvar varchar(200)
set @myvar = ‘Central West College of TAFE’
select * from Organisation_Name where orgname_name = @myvarM

|–Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([corpsys].[dbo].[Organisation_Name]))
|Index Seek(OBJECT:([corpsys].[dbo].[Organisation_Name].[Organisation_Name_nameix]),
SEEK:([Organisation_Name].[orgname_name]=[@myvar]) ORDERED FORWARD)

Here we see an INDEX SEEK lookup with a massive performance improvement:

Table ‘Organisation_Name’.
Scan count 1,
logical reads 9,
physical reads 0,
read-ahead reads 0.

Instead of 1,145 logical reads, there is only 9, a significant improvement
Bit vs. Char(1) –
One bit will take up 1 byte, 8 bits will still take up 1 byte, and a char(1) takes 1 byte. It could be argued that using a char(1) to store Y/N is more readable, but it is far more efficient to use a bit column if the table contains more than 1 bit column. You'll also eliminate the possibility of someone trying to put an A/B/C/etc. in the char(1) field, leading to incorrect data/functionality. Generally, it is considered better practice to use the bit data type. Even if the table only contains a 1 bit column, it will allow you to add additional bit columns in the future. There will be no need to modify the data type/data on the existing char(1) column in order to take advantage of the optimal data storage configuration.

BIT vs Tinyint –
If you only have a single BIT data type column in your table, then defining it as a BIT, or TINYINT uses the same amount of disk storage space. This is because to store a single BIT data type it requires 1 byte. Your space saving over TINYINT comes into play when you have multiple BIT data type columns in a single table.

Integer –
Integer types consist of tinyint (0-255, 1 byte), smallint (-32,768-32,767, 2 bytes), int (–2,147,483,648-2,147,483,647, 4 bytes) and bigint (–9,223,372,036,854,775,808-9,223,372,036,854,775,807, 8 bytes). These data types are often used in primary keys and non-clustered indexes, so it's important to pick the smallest data type available from the list.

It's tempting to just pick a large data type in SQL Server, but it's better to plan for immediate needs in the next 5 to10 years. You can always convert to a large data type later when CPUs, I/O and RAM have more bandwidth to support larger types. Also, since this data type is often used in primary keys (and consequently non-clustered index keys), the smaller the data type, the less work SQL Server will have to do during index maintenance.

UniqueIdentifier –
This data type consists of 16 byte storage, so it's very wide and is used to store GUIDs. It is commonly used as a primary key due to its unique nature. It is generally easier to merge two (or more) existing databases that have UniqueIdentifiers as primary keys as opposed to int types. The latter would require every row to be adjusted for a database merge.
With that in mind, be careful with this data type, since often you will have a UniqueIdentifier as a primary key along with several UniqueIdentifier foreign keys, making all the indexes very large, resource intensive and much more difficult to maintain. The increased row/index sizes also increase the overall load on the server. UniqueIdentifiers make it harder for SQL Server to check for "uniqueness" on inserts/updates, and that puts more pressure on the server. Generally, for high-transaction OLTP systems, you are better off avoiding the use of UniqueIdentifiers as primary keys.

When this data type is used as a primary key, you should use the NewSequentialID() rather than the NewID() function. That's because NewSequentialID always grabs a sequential ID, resulting in a clustered index that is "ever-increasing," which eliminates fragmentation on the clustered index. The NewID() function will cause heavy fragmentation on a clustered index due to page splits.


Decimal –
In SQL Server 2005 SP2 and greater (Enterprise, Developer and Evaluation Editions only), you can turn on the vardecimal option (database properties/Options/VarDecimal Storage Format Enabled). The vardecimal option varies the storage required for the decimal data type, depending on the actual data stored in the column. You can only turn it on at the database level rather than the table level.
In order to determine the space that will be saved on a particular table, run the sp_estimated_rowsize_reduction_for_vardecimal (located in master) system stored procedure. No changes need to be made on the column to use vardecimal and it will have no functional impact on code, since it is merely performance-related functionality.

(n)Varchar(n), (n)Varchar(Max), VarBinary(n), VarBinary(max), (n)Text, and Image-- When you plan to store large values, it is very important to pick the correct data type. These data types have the potential to be an issue, depending on the sizes chosen.

If you expect the text/data size to be fairly small, then it is recommended that you store the data on the data row, meaning you would want to use (n)Varchar(n) or VarBinary(n), which makes it subject to the 8k limit.

If the data is potentially large, then (n)Varchar(Max) or VarBinary(max) should be used, as the VarBinary(max), (n)Varchar(Max), (n)Text, and Image data types can be stored "out of row."

Storing large object data "out of row" means that the regular data row merely contains a pointer to another data row that stores the large object data, thus eliminating the 8k limit for the large object and resulting in smaller data row sizes. If a query does not require/return the large object, then it just scans/seeks on the smaller data row returning results much faster. It's able to transverse fewer data pages than if the large object was stored "in row."

If a query does require/return the large object, then it scans/seeks for the smaller data row and uses the pointer on that row to return the large object. You are better off not using the (n)Text or Image data types, replacing them with (n)Varchar(Max) and VarBinary(max), respectively.

If you choose to use the VarBinary(max), (n)Varchar(Max), (n)Text, or Image data type then it is best if the option to store the value "out of row" is chosen. This setting is controlled by calling the system stored procedure sp_tableoption N'MyTable', 'large value types out of row', 'ON'.

The size of the data rows and indexes impacts a wide range of hardware resources. SQL Server performance will increase exponentially as you reduce the size of the rows and indexes. A few fast points to remember: Always pick the smallest data type, use bits and integers (the smaller the int type the better) with lookup tables where possible – trying to avoid the use of (n)varchars.

If you have to use large text/binary values, then consider defining them as (Max) types and setting the options to store the data "out of row." If the text data is fixed length, then always use (n)char(n) data types so 2 bytes is not wasted storing offsets. If your primary concern is performance, avoid using any data types other than ints for primary/foreign keys, as this will significantly reduce your data row and index sizes. SQL Server data types seem like a small issue at design time, but they will have a huge impact on your system as it grows organically over time.

Datetime Vs SmallDateTime:

The DATETIME data type is another commonly misused data type that wastes space in your database. There are two different date data types: SMALLDATETIME, and DATETIME. Like the other data types, these two different date types require a different number of bytes to store their value.

The SMALLDATETIME data type requires 4 bytes of storage, and stores dates from January 1, 1900 through June 6, 2079 and the time portion is only accurate to the minutes.

The DATETIME data type takes 8 bytes to store its value and supports dates from January 1, 1753 through December 31, 9999 and the time portion is accurate to 3 milliseconds. If your application only needs to store the current date or some future date that does not to go beyond the year 2079, and you don’t need a time portion then the SMALLDATETIME column data type is the correct data type to use.
Most applications that deal with dates can save space by using the SMALLDATETIME data type, provided the application doesn’t need a time precision less than one minute. Anytime your application needs to store time down to the second, or have dates prior to 1900, or beyond 2079 then DATETIME data type must be used.

MONEY Vs SMALLMONEY:
Some applications use MONEY and SMALLMONEY data types to store currency amounts. Just like the other data types MONEY and SMALLMONEY take different amounts of space to store their data values. SMALLMONEY only takes 4 bytes and supports values from -214,748.3538 to 214,748.3647, whereas the MONEY data type takes 8 bytes and stores values from -922,337,203,685,477.5808 to 922,337,203,685,477.5807. If your application is storing monetary transactions that will never exceed a little over 200,000 then you will save 4 bytes for each transaction amount by using the SMALLMONEY data type.



Interview Questions:

  • How to convert text to integer in SQL? 
    If table column is VARCHAR and has all the numeric values in it, it can be retrieved as Integer using CAST or CONVERT function.

  • Will CAST or CONVERT thrown an error when column values converted from alpha-numeric characters to numeric?
    YES.

  • Will CAST or CONVERT retrieve only numbers when column values converted from alpha-numeric characters to numeric?
NO.





No comments:

Post a Comment