04 Apr

Data Warehousing: Columnstore Index

I’ve put this off for some reason.  And by putting it off, I mean for several days of studying.  It’s not like I don’t know what they are “columnstore make query go fast” but honestly, it also feels like one of those things that could get big, or that I should “prove” it, or something, I dunno.  But, it’s gonna be on 70-463, I’m almost certain, so here goes.  And, besides, it was this or SSIS security.

Because I like my explanations simple, here goes.  A columnstore has the following properties:

  • It is non-clustered.
  • There can only be one on a table.
  • They can not be updated.
  • They are very fast for data warehouse queries.
  • Columnstore indexes are compressed.
  • They are stored as columns and not rows.

There must be 9 million articles on that last point and  I thought this guy did a good job so I’ll let him explain how row and column storage works.

http://rusanu.com/2012/05/29/inside-the-sql-server-2012-columnstore-indexes/

Columnstore indexes have a ton of limitations.  This post will list the one’s I think are most important relative to BI so the fact that they can’t be used in replication won’t be mentioned, again.

Limitations (the section of cannots)

  • Cannot be unique.
  • Cannot be created on a view or indexed view.
  • Cannot act as a primary key or a foreign key.
  • Cannot be changed using the ALTER INDEX statement. Drop and re-create the columnstore index instead.
  • Cannot include the ASC or DESC keywords for sorting the index.
  • Cannot contain a column with a FILESTREAM attribute.
  • Cannot be updated.

And, there’s still more.

  • Cannot be used with change data capture.
  • Cannot be used with change tracking.
  • Cannot be used with replication (wai….).
  • Cannot be used with a bunch of data types (the main, common ones, are text, ntext, varchar(max), nvarchar(max)), however, the one’s you probably want to use are on the OK list.

At least one of those, probably more, but I’m only going to do one, deserves special treatment: Updating.

To update a columnstore index, the official documentation recommends 3 options:

  • Drop the index, INSERT, UPDATE and DELETE to your heart’s contentment, then recreate the columnstore index.
  • Partition the table and switch partitions.  If it’s a bulk insert, build a staging table, do your thing, build the columnstore index, and insert the table into an empty partition.  Otherwise, remove a partition, use it as a staging table, drop the index, do the deed, recreate the columnstore index, and add it back to the main table.
  • Leave the main table as it is.  Create a second table, without a columnstore index, use it to update, delete and insert data, then use a UNION ALL to combine the two tables.  As long as the second table is relatively small it should have a minor impact on query performance.  Note: This idea was worth this article alone.  I would not have considered it.

Source: http://msdn.microsoft.com/en-us/library/gg492088.aspx#Update

Lastly, what kind of column is best for a columnstore index?  The kind with a lot of duplicated values like a zip code, state, sales region, etc.  Seriously, that’s the best kind.

I have to say that I’m glad that I wrote this.  There is more to columnstore indexes than, “they go fast”.

Here is the MSDN for columnstore indexes.  It goes into everything I did and more.

http://msdn.microsoft.com/en-us/library/gg492088.aspx

Leave a Reply

Your email address will not be published. Required fields are marked *