11 May 2015

Clustered & Non-Clustered Indexes in SqlServer

Suppose we have 16 million records. When we try to retrieve records for two or three customers based on their customer id, all 16 million records are taken and comparison is made to get a match on the supplied customer ids. Think about how much time that will take if it is a web application and there are 25 to 30 customers that want to access their data through internet. Does the database server do 16 million x 30 searches? The answer is no because all modern databases use the concept of index.


Index is a database object, which can be created on one or more columns (16 Max column combinations). When creating the index will read the column(s) and forms a relevant data structure to minimize the number of data comparisons. The index will improve the performance of data retrieval and adds some overhead on data modification such as create, delete and modify.


If a table has a clustered index, then the rows of that table will be stored on disk in the same exact order. All the same entries belonging of a table would be right next to each other on disk. This is the “clustering”, or grouping of similar values, which is referred to in the term “clustered” index the query will run much faster than if the rows were being stored in some random order on the disk.


Drawback with Clustered Index:


If a given row has a value updated in one of its (clustered) indexed columns what typically happens is that the database will have to move the entire row so that the table will continue to be sorted in the same order as the clustered index column. Clearly, this is a performance hit. This means that a simple UPDATE has turned into a DELETE and then an INSERT – just to maintain the order of the clustered index. For this exact reason, clustered indexes are usually created on primary keys or foreign keys, because of the fact that those values are less likely to change once they are already a part of a table.


A table can have up to 999 non-clustered indexes because they don’t affect the order in which the rows are stored on disk like clustered indexes.


Clustered index defines the way in which data is ordered physically on the disk. And there can only be one way in which you can order the data physically. Imagine if we have two clustered indexes on a single table – which index would determine the order in which the rows will be stored? Since the rows of a table can only be sorted in only one way/index, a non-clustered index has no effect on which the order of the rows will be stored. So having more than one clustered index is not allowed.


A comparison of a non-clustered index with a clustered index with an example:


Non clustered indexes store both a value and a pointer to the actual row that holds that value. Clustered indexes don’t need to store a pointer to the actual row because of the fact that the rows in the table are stored on disk in the same exact order as the clustered index – and the row-level data like EmployeeName, EmployeeAddress, etc. are stored in its leaf nodes of the clustered index. This means that with a non-clustered index, extra work is required to retrieve data, as compared to clustered index. So, reading from a clustered index is generally faster than reading from a non-clustered index.


SQL server is using the Binary-Tree techniques to represent the clustered index. The Index page in a book is Non-Clustered index and the page numbers are clustered index arranged in a binary tree.


CREATE INDEX index_name ON table_name (column_name)

CREATE UNIQUE CLUSTERED INDEX index_name ON dbo.[Package](CreateDateKey, PackageID) WITH (ONLINE = ON, DATA_COMPRESSION = ROW)

ALTER TABLE dbo.Package WITH CHECK
ADD CONSTRAINT [index_name] PRIMARY KEY (PackageID) ON [PRIMARY]

ALTER TABLE dbo.Package WITH CHECK ADD CONSTRAINT [index_name] PRIMARY KEY NONCLUSTERED (PackageID) ON [PRIMARY]

DROP INDEX table_name.index_name

No comments: