An index in a database is pretty much similar to an index in a book which helps us to find a piece of data in an efficient manner.
For example, let say you want to find customers from Auckland city. Now if we have an index on city column then SQL Server can find all matching rows very efficiently and quickly otherwise SQL Server has to scan all rows individually to find all matching rows which will take extra time.
SQL Server uses a built-in component called Query Optimizer to figure out the fastest and efficient way to run all the queries that we write. It helps to avoid the disk I/Os which are slow and time-consuming. Having indexes on columns can help Query Optimizer to work efficiently.
Indexes affect the SELECT, UPDATE and DELETE statements and any query that contains WHERE and JOIN clause. To identify which columns need to be indexed there are some basic rules. Write your required SQL query and identify the columns used in WHERE or JOIN clause. Also, note the columns used in the SELECT statement. Then create indexes for selected columns.
How Indexes Work
If the column involved in WHERE or JOIN clause has no index then SQL Server finds the matching row using a technique called Table Scan. This means that it will inspect all rows one by one and returns the matching row.
On the other hand, indexes provide a more efficient way of finding data which is known as Balanced tree or B-tree. B-tree is a structure of data in which the data is divided into some levels to enable fast querying of data.
The structure of B-tree for the clustered index consists of three levels. The Root level, Intermediate level and Data level.
There will be only one root level and it will hold only one page. Intermediate levels can be more than one depending upon the amount of data. There will be only one data level and all pages at this level hold original data.
SQL Server uses this B-tree to find data quickly. The data in the B-tree structure is sorted in a particular fashion. Hence SQL Server doesn’t have to scan through all rows to find matching rows. The data can be found from specific pages using the seek technique instead of scanning all pages.
SQL Server database is made up of one or more files. These files contain tables. Each file is split into some extents. There are eight pages in each extent and each of these pages hold the data rows.
Extra information
Clustered Index
A clustered index defines the particular way in which data of the table is sorted. There can be only one clustered index per table. Clustered indexes enable very fast and efficient querying of data because data is returned as soon as it finds the data. The syntax to create a clustered index is given below.
CREATE CLUSTERED INDEX INDEX_NAME ON SCHEMA_NAME.TABLE_NAME(Column(s));
Non-Clustered Index
A non-clustered index doesn’t show how the data is sorted instead it holds the index keys in a special structure which is used to find the rows we are after. As soon as the rows are found the link contained in the non-clustered index is used to get the rows. We can have more than 1 (up to 999) indexes per table. It’s a default type of index. The syntax to create a non-clustered index is given below.
CREATE [NONCLUSTERED] INDEX INDEX_NAME ON SCHEMA_NAME.TABLE_NAME(Column(s));
The keyword NONCLUSTERED is optional here.
Index Seek and Scan
Indexes find data using scan and seek. A seek looks for data in B-tree using the parameters provided by the query predicates to filter the number of pages. Hence there is no need to search through all pages.
A scan starts looking for data from the beginning of the index and returns the all matching rows by scanning row by row. All rows are scanned to find matching rows.
In most of the cases, the seek performs well than scan but in some cases, scans are better than seeks.
Included Columns
This is a very interesting feature of indexes. The columns that we use in the index creation are called indexed keys. But other than that we can include other columns as well which we can retrieve while querying. These columns can be included in INCLUDE clause while creating the index. For example,
CREATE NONCLUSTERED INDEX IX_NC_IndexName ON SchemaName.TableName(Column(s)) INCLUDE (Column(s));
In INCLUDE clause we can add extra needed columns. Do not add too many columns in INCLUDE clause because SQL Server has to manage indexes after every insert, update and delete.