30 Mar

Data Warehousing: Dimensions

Apparently, explaining a dimension is hard.  Here is an example that I found online:

In a data warehouse, Dimensions provide structured labeling information to otherwise unordered numeric measures. The dimension is a data set composed of individual, non-overlapping data elements. The primary functions of dimensions are threefold: to provide filtering, grouping and labeling.


Microsoft did it a bit better

Dimension tables encapsulate the attributes associated with facts and separate these attributes into logically distinct groupings, such as time, geography, products, customers, and so forth.


Of course you have to know what a Fact table is first.

Now, I’ll try, and probably do it badly.

A dimension is an attribute of a transaction that you use to measure the transaction.  For instance, a customer is a dimensional attribute.  When you record a sales transaction you record the customer with the transaction.  However, recording the customers name, address, city and phone number with the transaction is problematic for several reasons but mostly because it’s inefficient.  It’s easier, and cleaner, to store that information in a separate table.

Other types of dimensional attributes include:

  • Time / Date
  • Warehouse
  • Customer
  • Sales Representative
  • SKU

There are several different types of dimensions.  These are:

  • Conformed Dimension: This is a dimension that applies to more than one Fact table.  For instance, a product might be sold at one location and this same location information might be used to analyze purchasing patterns at the same location.
  • Junk Dimension: Easier to show than explain.  You might run a program that includes 10% off with a coupon.  If the user uses the coupon you want to flag the transaction as Yes, a coupon was used, or, No, it was not.  Rather than cluttering up the fact table with this information you create a separate dimension table.
  • Degenerate dimension: This is a key, that would normally join to a dimension table, but does not.  For instance, the claim number for an insurance claim.  It’s not directly relevant to the claim transaction, which probably has its own key, but it is relevant to the claim.
  • Role-playing dimension: A dimension that is used for multiple roles in the same fact table.  This is most commonly a date field.  For instance, you might have a fact table with the “order date” and the “invoice date”.

Hopefully, that will more than cover anything that shows up on the 70-463 test.