Dot Net For All

SQL Clustered and Non clustered Index for developers

Hello Friends, I have been learning about a very interesting topic. If you are anyway related to SQL server or any database, this is a must know for you. I want to discuss the SQL clustered and non clustered index. Better understanding of indexes would help you to design the tables and stored procedures in a better way.

Why Indexes?

Before starting the understanding of clustered and non clustered index, we need to understand the indexes. Lets try to understand it with a very basic example.

Suppose you have to find a number in the telephone directory. The names in the telephone directory are not sorted or ordered alphabetically. And you have to find a telephone number from that directory.

What option do you have to find the phone number? I think the best option is to go through all the pages and all the entries one by one, unless and un till you find one. If the number is somewhere on the starting pages, you are lucky otherwise you may have to go through all the pages.

Now if we sort the numbers by the First name or last name it becomes much easier to find the number. This is the power of indexing. And in this case the indexing helps us to directly take to the data i.e. phone number.

Another example is the indexes at the end of the book. If you are looking for some specific topic or keyword in the book, you go to the indexes at the end of book. This is usually sorted by the keyword or topic names. But many times there are many entries related to the keyword.

Below is an example of a books example.

As we can see the index in the book doesn’t show as the data related to the keyword, instead it gives us the page or pointer to the data in form of page number.

One more thing you can notice is that topic under applications keyword is are further sorted.

Keeping these two analogies in mind lets continue our discussion of the topic of clustered and non clustered index in SQL.

Clustered index in SQL

Clustered index is same as telephone directory.

If we do not have any primary key or clustered index, the data is stored as heap. It is not organized and stored randomly.

And with a clustered index, the table is sorted as per the clustered index column and while looking for data by the clustered index, the SQL does a index seek and comes directly to the value we are looking for.

Lets understand it with help of example. Below is code to create a simple table named Employee. There is primary key or any clustered table for the table. And I am inserting some random data.

CREATE TABLE Employee
(
	ID INT,
	FIRST_NAME varchar(50) NOT NULL,
	LAST_Name varchar(50) NOT NULL,
	Salary int NOT NULL
)

INSERT INTO Employee Values (3, 'Bill', 'B', 1000)
INSERT INTO Employee Values (4, 'Nick', 'B', 1000)
INSERT INTO Employee Values (1, 'Vikram', 'B', 1000)
INSERT INTO Employee Values (2, 'Steve', 'B', 1000)
INSERT INTO Employee Values (5, 'Robert', 'ch', 1000)
INSERT INTO Employee Values (5, 'Steve3', 'ch', 1000)

Once you run the Select * query on the table, you may see the data in the table. You can see that the data is not ordered by any column and it is the way we inserted it.

Data stored without clustered index

And if I do want to retrieve a specific data with any ID, as shown in the below figure. The SQL will do a table scan. Which is a costly operation.

Lets change the design of the table. Create the same table with a primary key. Please note that primary key creates a clustered index on the same column of the table.

CREATE TABLE Employee
(
	ID INT Primary Key,
	FIRST_NAME varchar(50) NOT NULL,
	LAST_Name varchar(50) NOT NULL,
	Salary int NOT NULL
)

INSERT INTO Employee Values (3, 'Bill', 'B', 1000)
INSERT INTO Employee Values (4, 'Nick', 'B', 1000)
INSERT INTO Employee Values (1, 'Vikram', 'B', 1000)
INSERT INTO Employee Values (2, 'Steve', 'B', 1000)
INSERT INTO Employee Values (5, 'Robert', 'ch', 1000)

I have created a primary key for the ID column. Once we assign primary key to any column we cannot insert duplicate values for that column and it cannot be NULL.

Again I am inserting the data randomly in the table.

Let run a Select * query on the table.

Did you see the difference. The data is ordered by ID column once we define a primary key.

Below is the execution plan for the Select query with a clustered index. As you can see there was only one logical read happened in this case, against the 6 logical read in the previous case.

Finally we got our data very fast with the help of Index seek operation in case of clustered index. This was not the case if we did not have the clustered index. The sql carried out the whole table scan to look out for the records.

Non Clustered Index

Non clustered index is the way of index the non primary key columns. As we can set clustered index only on a single column, the retrieval of data for the non primary key column can be a time consuming operation.

To circumvent the data retrieval using the non primary key columns we use non clustered index. This is same as the index in the end of the book.

In case of non clustered index, a separate data structure is created for the column which is set a non clustered index, and the rows in that data structure are sorted by clustered index column. And that data store contains the pointer to the original data in our base table.

Take the example of out table. We created the clustered index for ID column but suppose we want to execute the below query.

Select * from Employee where First_Name = ‘Vikram’

To retrieve the data table scan will be done. To prevent the table scan we can create a clustered index for the First_Name column. The only draw back of clustered index is that it consumes more disk space as a separate data structure is created.

Lets try and understand this with the example of some illustration.

On the left hand side of the above figure is our main table, which had a clustered index on ID column. But SQL creates a new data structure for all the clustered indexes and the right side table denotes that structure.

If we create the table without any primary key or non clustered index and execute the below query.

Select * from Employee where First_Name = ‘Vikram’

We will get the same result as shown in figure 1. The query would perform a full table scan.

Now lets create the table with a non clustered index on the First_Name column using the below code. After that insert some data using the same query as I have used above.

CREATE TABLE Employee
(
	ID INT Primary key,
	FIRST_NAME varchar(50) NOT NULL,
	LAST_Name varchar(50) NOT NULL,
	Salary int NOT NULL,
	CONSTRAINT [PK_FIRST_NAME] UNIQUE NONCLUSTERED ([FIRST_NAME] ASC) WITH (FILLFACTOR = 90)
)

Finally execute the below query.

Select * from Employee where First_Name = ‘Vikram’

Lets check the execution plan for the query. As you can see two operations are performed for this query execution.

Execution plan for non clustered index

First operation performs a Index seek on the secondary data structure. And takes out the pointer of the data in the main table. As we have a primary key defined for this table, the secondary table uses the primary key as pointer.

If there is no primary key defined, the SQL server uses its own row locator to seek the data.

Summary and Conclusion

Clustered and non clustered index in SQL are used to make the data retrieval process fast. Below are some important points

Finally with this article, I have tried to make the concept of clustered and non clustered index a bit easier for any one who is learning SQL basics.

Top career enhancing courses you can't miss

My Learning Resource

Excel your system design interview