29 Mar

Data Warehousing: Slowly Changing Dimensions

There is a “Slowly Changing Dimension” transform in SSIS that I will write about at some point but this article is about the concept of slowly changing dimensions from the BI / Data warehouse angle.

A slowly changing dimension is a dimensional attribute that can change over time.  The first time I ran into a SCD was with our customer data (we’d built a big pile of sales data).  I ran a GROUP BY on the customer number and customer name and the same customer showed up twice.  The customer had changed  its name and the GROUP BY (technically, this was a Cyberquery, so it was actually a SUM) was returning two results for the same customer.  I made a relatively quick decision, the last record in, wins, and unbeknownst to us I had implemented a Type 1 SCD.

Examples of SCD’s include

  • A customer moves to a new location.
  • A sales representative changes territories.
  • Supplier information changes.
  • A product description is updated.

Slowly changing dimensions are maintained in a numbers ways.  The approach to each  is known as a type and for this article there will be 5 (0 to 4).  Lets look at each one.

Type 0

This is the Kim Kardashian of slowly changing dimensions in that it just sits there, looks pretty, and doesn’t do a thing.  In other words, you don’t change or account for any changes in the dimension.

Type 1

This is the current trend in Hollywood, or so the purists whine, you don’t have a new idea, so you reboot the old one.  In other words, when the data changes, you overwrite the old data with the new data.  For instance,if a customer name changes then you change their name in the dimension to the new name.

Type 2

This approach is, somewhat, akin to a movie sequel.  You have the first movie, then a second and then a third.  In a data warehouse sense, it means that you create a new record every time a dimension changes.  Typically you would track that information with a start date and end date column so you have a history of when a dimension was a specific value.  Type 2 SCD’s are usually maintained with a start and ending date but can also be versioned.

A type 2 SCD example:

Keycust_numbercust_namestart_dateend_date
1150004Way Out BI2004-01-012004-12-31
2150004BI Done Right2005-01-01

Type 3

This is a like a time travel movie where our hero meets future and past versions of himself.  The sequel and the prior movie all take place in a single movie.  That’s a lousy example!

A type 3 SCD uses columns in the same table to track changes.  In my earlier example, instead of two rows you have a single row with a current, and at least one past record.  It would look something like this:

keycust_numbercurrent_nameeffective_dateprior_name
1150004BI Done Right2005-01-01Way Out BI

This is not an approach that I would be happy to see.

Type 4

This is your museum.  With this approach you keep the current record in the dimension table and any earlier records are kept in a history table with a date field to track changes.

There are, apparently, still more approaches to SCD’s but for the purposes of the 70-463 test I can’t imagine them popping up.  If they do, well, here’s to beating the odds.

This article was very helpful.

http://en.wikipedia.org/wiki/Slowly_changing_dimension