What is Column store index?

Column store index is a new feature in MS SQL server, initially introduced with MS SQL server 2012 but had some limitations. Those limitations have been removed in MS SQL server 2016.

Column store index is designed to speed up analytical processing and data warehouse querying. As the name suggests, here data is stored column wise instead of in rows. Column store indexes, when used appropriately, can reduce disk I/O and use memory more efficiently which can have a direct impact on query performance. According to Microsoft, under certain conditions, column store index provides queries with speed improvements of 4X, 10X or even 100X. When you execute a query on the table it fetches only the required columns while on the row wise store (traditional Index), entire pages get loaded into memory.

You can now define column store indexes on your database tables. A column store index stores data in a column-wise (columnar) format, unlike the traditional B-tree structures used for clustered and non-clustered row store indexes, which store data row-wise (in rows). A column store index organizes the data in individual columns that are joined together to form the index. This structure can offer significant performance gains for queries that summarize large quantities of data, typically used for business intelligence (BI) and data warehousing.

Data stored as rows (Clustered Index)

Data stored as columns (Column Store Index)

For better understanding, let me explain some of the terminologies used here

Segment – In the Column Store Index, each column is its own segment. A segment can contain values from one column only, which allows each column’s data to be accessed independently. However, a column can span multiple segments, and each segment can be made up of multiple data pages. Data is transferred from the disk to memory by segment, not by page. A segment is highly compressed and one that can contain up to one million rows.

Row Group – A column’s data won’t always fit into single segment, given the one million rows limitation. In such cases, where data exceeds one million rows, multiple segments are created for each column and grouped into multiple row groups, one for each set of segment. When a column store index is broken into multiple row groups, each row group contains a set of complete rows.

Now we know what column store index is and how data gets stored physically. We shall move further and examine how Column store index reduces the I/O.

For the sake of an example I have “Table_ClusteredIndex” with clustered Index on C1. As we know the data will be stored row wise in this scenario and will get split across pages, as shown below.

Now suppose we run the following query against the Table

SELECT C1, C2 from Table_ClusteredIndex

SQL Server engine processes the query; it retrieves all three data pages into memory, fetching all the columns in the table, even though most of the columns are not required. In other words, the system wastes valuable I/O and memory resources to retrieve unnecessary data.

I am now going to create the Non Clustered Column stored index on the table and let’s look at what happens. As an example I am including all the columns in the index, to be consistent with our earlier example – ClusteredIndex, though in reality we would probably include only some of the columns. All the columns we include, are stored as columns within the index.

Given below Illustrates what such an index might look like. As you can see, the data is no longer stored by row.

Data compression

Another factor in the column stored index, which plays a major role in improving the performance, is Data compression. In a column store index data is grouped by columns, rather than by rows, and hence data can be compressed more efficiently than with row store indexes. Data read from a single column is more homogenous than data read from rows, and the more similar the data, the easier it is to compress. In addition, a low number of distinct values also helps improve the compression rate.

In fact, Column Store Index and Data compression are based on xVelocity engine, an advanced storage and compression technology.

What is xVelocity engine?

We shall be talking about that in another post soon.

The following two tabs change content below.
Upendra Patel

Upendra Patel

Virtualizing, analytics and Business Intelligence gives me the rush! Professionally, a SQL Server and Microsoft BI developer, looking to explore more on data and the latest data platforms. Apart from work, I am a foodie who loves his music and movies.
Upendra Patel

Latest posts by Upendra Patel (see all)