A SQL table explanation is not good enough for getting the desired
data very quickly or sorting the data in a specific order.
What we
actually need for doing this is some sort of cross reference facilities
where for certain columns of information within a table, it should be
possible to get whole records of information quickly. But if we consider
a huge amount of data in a table, we need some sort of cross reference
to get to the data very quickly. This is where an index within SQL
Server comes in.
So an index can be defined as:
- “An index is an on-disk structure associated with a table or
views that speed retrieval of rows from the table or view. An index
contains keys built from one or more columns in the table or view”.
These keys are stored in a structure (B-tree) that enables SQL Server to
find the row or rows associated with the key values quickly and
efficiently.”
Index Structures
For example, if you create an index on the primary key and then search for a row of data based on one of the primary key values, SQL Server first finds that value in the index, and then uses the index to quickly locate the entire row of data. Without the index, a table scan would have to be performed in order to locate the row, which can have a significant effect on performance.
An index is made up of a set of pages (index nodes) that are organized
in a B-tree structure. This structure is hierarchical in nature, with
the root node at the top of the hierarchy and the leaf nodes at the
bottom, as shown in Figure below.
Figure : B-tree structure of a SQL Server index
When a query is issued against an indexed column, the query engine
starts at the root node and navigates down through the intermediate
nodes. The query engine continues down through the index nodes until it reaches the leaf node.
Clustered Indexes
A clustered index
stores the actual data rows at
the leaf level of the index. Returning to the example above, that would
mean that the entire row of data associated with the primary key value
of 123 would be stored in that leaf node.
An important characteristic of the clustered index is that the indexed
values are sorted in either
ascending or descending order. As a result, there can be
only one clustered index on a table or view.
In addition, data in a table is
sorted only if a clustered index has
been defined on a table.
Indexes are first sorted on the first column in the index, then any
duplicates of the first column and sorted by the second column, etc.
Nonclustered Indexes
In non-clustered index the leaf nodes of a
nonclustered index contain only the values from the indexed columns and
row locators that point to the actual data rows, rather than contain the
data rows themselves. This means that the query engine must take an
additional step in order to locate the actual data.
A
row locator’s structure depends on whether it points to a
clustered
table or to a heap.
1. If referencing a clustered table, the row locator
points to the clustered index, using the value from the clustered index
to navigate to the correct data row.
2. If referencing a heap, the row
locator points to the actual data row.
(Note: A table that has a clustered index is referred to as a clustered table. A table that has no clustered index is referred to as a heap.)
--> Nonclustered indexes cannot be sorted like clustered indexes.
Max number of clustered index in table: 1
Max number of non-clustered index in table:
- In sql server 2005- 249
- in sql server 2008- 999
Composite index:
An index that contains more than one column. In both SQL Server 2005 and 2008, you can include up to
16 columns in an index.
Consider the following guidelines when planning your indexing strategy:
1. For tables that are heavily updated, use as few columns as possible in the index, and
don’t over-index the tables.
2. If a table contains a lot of data but data modifications are low, use as many indexes as necessary to improve query performance.
3. For clustered indexes, try to keep the length of the indexed columns as short as possible.
4. The uniqueness of values in a column affects index performance. In
general, the more duplicate values you have in a column, the more poorly
the index performs.
5. In multi-column indexes, list the most selective (nearest to unique)
first in the column list. For example, when indexing an employee table
for a query on social security number (SSN) and last name (lastName),
your index declaration should be:
CREATE NONCLUSTERED INDEX ix_Employee_SSN
ON dbo.Employee (SSN, lastName);
Syntax:
Create Clustered Index index_name on table_name (Columns_name)
CREATE NONCLUSTERED INDEX ix_Employee_SSN
ON dbo.Employee (SSN, lastName);
Disadvantages:
1. Both clustered indexes, and nonclustered indexes take up additional disk
space. The amount of space that they require will
depend on the columns
in the index, and the number of rows in the table.
2. Indexes will increase the amount of time that your INSERT, UPDATE and DELETE
statement take, as the data has to be updated in the table as well as in
each index.
3. Columns of the TEXT, NTEXT and IMAGE data types can not be indexed using
normal indexes. Columns of these data types can only be indexed with
Full Text indexes.
Disadvantage for Clustured Index:
If we update a record and change the value of an indexed column in a
clustered index, the
database might need to move the entire row into a
new position to keep the rows in sorted order. This behavior essentially
turns an update query into a DELETE followed by an INSERT, with an
obvious decrease in performance. A table's clustered index can often be
found on the primary key or a foreign key column, because key values
generally do not change once a record is inserted into the database.
Disadvantage for NonClustured Index :
The disadvantage of a non-clustered index is that it is slightly slower
than a clustered index and they can take up quite a bit of space on the
disk.
Another disadvantage is using too many indexes can actually
slow your database down. Thinking of a book again, imagine if every
"the", "and" or "at" was included in the index. That would stop the
index being useful - the index becomes as big as the text! On top of
that, each time a page or database row is updated or removed,
the
reference or index also has to be updated.
The DROP INDEX Command:
An index can be dropped using SQL
DROP command. Care should be taken when dropping an index because performance may be slowed or improved.
The basic syntax is as follows:
DROP INDEX index_name;