04 Apr

Data Warehousing: Columnstore Index

I’ve put this off for some reason.  And by putting it off, I mean for several days of studying.  It’s not like I don’t know what they are “columnstore make query go fast” but honestly, it also feels like one of those things that could get big, or that I should “prove” it, or something, I dunno.  But, it’s gonna be on 70-463, I’m almost certain, so here goes.  And, besides, it was this or SSIS security.

Because I like my explanations simple, here goes.  A columnstore has the following properties:

  • It is non-clustered.
  • There can only be one on a table.
  • They can not be updated.
  • They are very fast for data warehouse queries.
  • Columnstore indexes are compressed.
  • They are stored as columns and not rows.

There must be 9 million articles on that last point and  I thought this guy did a good job so I’ll let him explain how row and column storage works.


Columnstore indexes have a ton of limitations.  This post will list the one’s I think are most important relative to BI so the fact that they can’t be used in replication won’t be mentioned, again.

Limitations (the section of cannots)

  • Cannot be unique.
  • Cannot be created on a view or indexed view.
  • Cannot act as a primary key or a foreign key.
  • Cannot be changed using the ALTER INDEX statement. Drop and re-create the columnstore index instead.
  • Cannot include the ASC or DESC keywords for sorting the index.
  • Cannot contain a column with a FILESTREAM attribute.
  • Cannot be updated.

And, there’s still more.

  • Cannot be used with change data capture.
  • Cannot be used with change tracking.
  • Cannot be used with replication (wai….).
  • Cannot be used with a bunch of data types (the main, common ones, are text, ntext, varchar(max), nvarchar(max)), however, the one’s you probably want to use are on the OK list.

At least one of those, probably more, but I’m only going to do one, deserves special treatment: Updating.

To update a columnstore index, the official documentation recommends 3 options:

  • Drop the index, INSERT, UPDATE and DELETE to your heart’s contentment, then recreate the columnstore index.
  • Partition the table and switch partitions.  If it’s a bulk insert, build a staging table, do your thing, build the columnstore index, and insert the table into an empty partition.  Otherwise, remove a partition, use it as a staging table, drop the index, do the deed, recreate the columnstore index, and add it back to the main table.
  • Leave the main table as it is.  Create a second table, without a columnstore index, use it to update, delete and insert data, then use a UNION ALL to combine the two tables.  As long as the second table is relatively small it should have a minor impact on query performance.  Note: This idea was worth this article alone.  I would not have considered it.

Source: http://msdn.microsoft.com/en-us/library/gg492088.aspx#Update

Lastly, what kind of column is best for a columnstore index?  The kind with a lot of duplicated values like a zip code, state, sales region, etc.  Seriously, that’s the best kind.

I have to say that I’m glad that I wrote this.  There is more to columnstore indexes than, “they go fast”.

Here is the MSDN for columnstore indexes.  It goes into everything I did and more.


03 Apr

Data Warehousing: Stewardship Vs. Governance

This is another one of those terms in DW / BI that gets really muddled.  In fact, going through a few books, and a few sites, I didn’t find a single concise explanation.  In fact, they are often intermingled.  So, here’s my take, in simple words.

Data Governance:  These are the guys who write the laws.  They define the rules and regulations that govern your data.  They are Congress.

Data Stewardship: The enforcement, and maintenance, of your data rules and policies.  These are your cops, or to confuse the issue, because this is BI and the language has to make the easy, confusing, data governors.

Now, this is where it gets confusing.  Data stewards often are also part of data governance.  In other words, it’s like a cop being elected to Congress but he keeps his day job as a cop.  They do both.

Still, and I think for a simple explanation, Congress and cops makes for a respectable explanation.  So, who are the robbers?  The chuckleheads entering data into your systems.


01 Apr

Data Quality Services: What It Does

For some reason I have had a hard time getting my head wrapped around what this product does and how it does it.  I just did some research on the topic and it amazes me how many articles out there that say “the first step is creating a knowledge base, and, here’s how you do it”.  Far too many articles jump into the “how” but completely neglect the “what it does” piece.

The Big 3.

The Big 3.

What I’m going to do is break down the DQS sections and explain them in general terms.  I’m certain that I’ll see at least one generic type of question where DQS is one of a list of potential solutions to a problem and I want to get at least that much right about the product.

As such, this is a 10,000 foot version post.  I’m not going step-by-step through a process or into any real detail on the product.  All of those other websites have already done that.  I just want to nail down what each section does in a general sense because I simply couldn’t find this information in a single place.

DQS, consists of three things: Knowledge Base Management, Data Quality Projects, and Administration.  In more detail they do the following things.

Knowledge Base Management

A knowledge base is a collection of metadata that you use for cleansing, matching or profiling.  It is, roughly, equivalent to a table schema, except that it’s more involved than that.   But, as far as I’m concerned, conceptually, that works.

Domain Management.

Domain Management.

To create a knowledge base you must first create a domains.  Domains are attributes in a very broad and very loose sense.  Specifically, there are 5 parts to the demain management section of DQS.

  • Domain properties: Attributes such as name, definition, format, etc.
  • Reference data: This is where you would add external data sources.  For instance, a list of valid zip codes could probably be obtained from a 3rd party data resource here.
  • Domain Rules: Conditions that you can apply to a domain.  A valid zip code must have at least 5 letters is a condition that you could apply to a domain.
  • Domain Values: If you have a data source that you are working with, you can view it here and confirm or clean it up.  So, if the potential data in your domain consists of 94520 and 94522, this is where you would adjust 94522 to 94523.
  • Term-Based Relations: You can create a list of values that map to another value.  Biz, Bus. and Bus could all be mapped to the word Business which could help to clean up your data.

So, a knowledge base consists, more or less, of rules used to restrict and clean up your data.

Data Quality Projects

Managing the results of a cleanse.

Managing the results of a cleanse.

In this section you take a data source and apply the knowledge base you created to it.  A data project consists of the following:

  • Map: You map your target data to an existing knowledge base.  It can be in a Excel file or SQL Server database.
  • Cleanse: Where you process the previous mapping step.
  • Manage and view results: This is where you review the results of the cleansing process.  You can correct and make adjustments to the results.
  • Export: You can export your results to SQL Server, Excel or a CSV file.

Data quality projects are how you process your data through a knowledge base.


This part has two sections: Activity Monitoring and Configuration.

Activity Monitoring is a list of all activities performed in DQS.  This includes things like domain management, creating a knowledge base and cleansing data.  Right-clicking gives you additional options, including the ability to terminate a running process.  The Profiler tab at the bottom provides more information on each specific activity.

Activity Monitoring.

Activity Monitoring.

Configuration consists of 3 sections:

  • Reference Data: Manage access to third-party data providers.
  • General Settings: Some, uh, general settings for the program with help buttons that didn’t work when I clicked on them.  I’m guessing my default browser isn’t supposed to be Firefox, or, maybe they just don’t work.
  • Log Settings: Set logging options, specifically when to trigger logging.  You can drill it down to specific modules.

I hope this helps someone.  Personally, I really struggled with getting this clear in my head and writing this has definitely helped.

31 Mar

Data Warehousing: Surrogate Keys Vs. Natural Keys

Surrogate Keys

These are keys that have no natural meaning.  An identity property is an example of a surrogate key.

Natural Keys

Natural keys are keys that have a business meaning.  For instance, a SSN, order number or even a warehouse number.

The rule of thumb for data warehousing, actually it’s pretty much law, is to always use surrogate keys. Some of the advantages of surrogate keys include:

  • They separate the operational environment from the data warehouse.
  • There are performance advantages.
  • Slowly Changing Dimensions: Surrogate keys allow for type 2 SCD’s.

I would be willing to argue for natural keys, I don’t think it’s as black and white as many think, but for testing purposes, I think it would be very unlikely that a correct answer would be “use natural keys”.  Of course this probably means I’ll get one.

30 Mar

Data Warehousing: Dimensions

Apparently, explaining a dimension is hard.  Here is an example that I found online:

In a data warehouse, Dimensions provide structured labeling information to otherwise unordered numeric measures. The dimension is a data set composed of individual, non-overlapping data elements. The primary functions of dimensions are threefold: to provide filtering, grouping and labeling.


Microsoft did it a bit better

Dimension tables encapsulate the attributes associated with facts and separate these attributes into logically distinct groupings, such as time, geography, products, customers, and so forth.


Of course you have to know what a Fact table is first.

Now, I’ll try, and probably do it badly.

A dimension is an attribute of a transaction that you use to measure the transaction.  For instance, a customer is a dimensional attribute.  When you record a sales transaction you record the customer with the transaction.  However, recording the customers name, address, city and phone number with the transaction is problematic for several reasons but mostly because it’s inefficient.  It’s easier, and cleaner, to store that information in a separate table.

Other types of dimensional attributes include:

  • Time / Date
  • Warehouse
  • Customer
  • Sales Representative
  • SKU

There are several different types of dimensions.  These are:

  • Conformed Dimension: This is a dimension that applies to more than one Fact table.  For instance, a product might be sold at one location and this same location information might be used to analyze purchasing patterns at the same location.
  • Junk Dimension: Easier to show than explain.  You might run a program that includes 10% off with a coupon.  If the user uses the coupon you want to flag the transaction as Yes, a coupon was used, or, No, it was not.  Rather than cluttering up the fact table with this information you create a separate dimension table.
  • Degenerate dimension: This is a key, that would normally join to a dimension table, but does not.  For instance, the claim number for an insurance claim.  It’s not directly relevant to the claim transaction, which probably has its own key, but it is relevant to the claim.
  • Role-playing dimension: A dimension that is used for multiple roles in the same fact table.  This is most commonly a date field.  For instance, you might have a fact table with the “order date” and the “invoice date”.

Hopefully, that will more than cover anything that shows up on the 70-463 test.




30 Mar

Data Warehousing: Measures

Measure in a data warehouse are, to put it simply, because I’m all about simple these days, measures that you can add.  For instance, sales, is an additive measure.  But, of course, because we’re in data warehouse land it’s not that easy.

There are three types of measures.

Additive Measures

A measure that can be aggregated across all of its dimensions.  For instance, sales is additive by date, by customer, by location and so on, and so forth.

Semi-Additive Measures

This is a measure that can be aggregated by some dimensions but not by all dimensions.  The most common example is inventory.  You can add, or subtract, inventory over a time-period, so it’s additive in the sense that you can know your quantity sold for a time-period.  However, using that same time-period, would probably not yield an exact total of your inventory on-hand.

Non-Additive Measures

This is a measure that cannot be aggregated.  For instance, sales and cost data are additive in that they can be aggregated, separately,  to produce a gross profit percentage.  However, that gross profit percentage can not be aggregated further.  For instance, if you calculate gross profit by invoice, aggregating those gross profit numbers will not yield the gross profit percentage for the business.

29 Mar

Data Warehousing: Slowly Changing Dimensions

There is a “Slowly Changing Dimension” transform in SSIS that I will write about at some point but this article is about the concept of slowly changing dimensions from the BI / Data warehouse angle.

A slowly changing dimension is a dimensional attribute that can change over time.  The first time I ran into a SCD was with our customer data (we’d built a big pile of sales data).  I ran a GROUP BY on the customer number and customer name and the same customer showed up twice.  The customer had changed  its name and the GROUP BY (technically, this was a Cyberquery, so it was actually a SUM) was returning two results for the same customer.  I made a relatively quick decision, the last record in, wins, and unbeknownst to us I had implemented a Type 1 SCD.

Examples of SCD’s include

  • A customer moves to a new location.
  • A sales representative changes territories.
  • Supplier information changes.
  • A product description is updated.

Slowly changing dimensions are maintained in a numbers ways.  The approach to each  is known as a type and for this article there will be 5 (0 to 4).  Lets look at each one.

Type 0

This is the Kim Kardashian of slowly changing dimensions in that it just sits there, looks pretty, and doesn’t do a thing.  In other words, you don’t change or account for any changes in the dimension.

Type 1

This is the current trend in Hollywood, or so the purists whine, you don’t have a new idea, so you reboot the old one.  In other words, when the data changes, you overwrite the old data with the new data.  For instance,if a customer name changes then you change their name in the dimension to the new name.

Type 2

This approach is, somewhat, akin to a movie sequel.  You have the first movie, then a second and then a third.  In a data warehouse sense, it means that you create a new record every time a dimension changes.  Typically you would track that information with a start date and end date column so you have a history of when a dimension was a specific value.  Type 2 SCD’s are usually maintained with a start and ending date but can also be versioned.

A type 2 SCD example:

1150004Way Out BI2004-01-012004-12-31
2150004BI Done Right2005-01-01

Type 3

This is a like a time travel movie where our hero meets future and past versions of himself.  The sequel and the prior movie all take place in a single movie.  That’s a lousy example!

A type 3 SCD uses columns in the same table to track changes.  In my earlier example, instead of two rows you have a single row with a current, and at least one past record.  It would look something like this:

1150004BI Done Right2005-01-01Way Out BI

This is not an approach that I would be happy to see.

Type 4

This is your museum.  With this approach you keep the current record in the dimension table and any earlier records are kept in a history table with a date field to track changes.

There are, apparently, still more approaches to SCD’s but for the purposes of the 70-463 test I can’t imagine them popping up.  If they do, well, here’s to beating the odds.

This article was very helpful.


08 Mar

Grouping By An Unknown Range

Lets imagine that you wanted to break out your order data in 10% increments from the lowest sales to the largest sale?  And, suppose that your data was actually functional for this sort of thing and you didn’t have widgets that sold for $0.09 cents right next to 90,000 Lamborghini’s?  And, suppose this kind of analysis was actually useful for something besides a SQL GROUP BY example, OK, just suppose.  Anyways, here is, a process that you might go through and a query that would satisfy the problem.

Step 1 – Generate a range for your data.

USE adventureworks2012;
DECLARE @minval decimal(18,2);
DECLARE @maxval decimal(18,2);

SELECT @maxval = MAX(totaldue), @minval = MIN(totaldue)
FROM sales.salesorderheader;
SELECT @maxval AS 'Max', @minval AS 'Min';

Max	        Min
187487.83	1.52

Right off the bat we need to contact Houston because this won’t work, well, it’ll work when we imagine it working later, but this is telling me that we have no negative amounts in our order data. In other words, that we’ve never issued a single credit. In other words, that our data is probably useless.

However, this is AdventureWorks, dammit, we don’t issue credits, and since it’s true in the data, I’ll return you to your regular programming.

Step 2 – Create a clause that segments our data range into 10% increments.

Here’s how you would do this. You create a case statement based on the @maxval value you created, and because we can in this case, 0, use it to segment the GROUP BY, as follows.

USE adventureworks2012;
DECLARE @range decimal(18,2);

SELECT @range = MAX(totaldue)
FROM sales.salesorderheader;

	WHEN totaldue / @range <= .10 THEN '.0000 <= .10'
	WHEN totaldue / @range <= .20 THEN '.1001 <= .20'
	WHEN totaldue / @range <= .30 THEN '.2001 <= .30'
	WHEN totaldue / @range <= .40 THEN '.3001 <= .40'
	WHEN totaldue / @range <= .50 THEN '.4001 <= .50'
	WHEN totaldue / @range <= .60 THEN '.5001 <= .60'
	WHEN totaldue / @range <= .70 THEN '.6001 <= .70'
	WHEN totaldue / @range <= .80 THEN '.7001 <= .80'
	WHEN totaldue / @range <= .90 THEN '.8001 <= .90'
	ELSE '.9001 <= 1.0' END AS Range,
CAST(totaldue AS DECIMAL(18,2)) AS totaldue
FROM sales.salesorderheader
SELECT Range, 
CAST(SUM(totaldue) AS DECIMAL(18,2)) AS [Total Due], 
COUNT(totaldue) AS [Transaction Count], 
CAST(AVG(totaldue) AS DECIMAL(18,2)) AS [Average Sale]
FROM cte

Range	        Total Due	Transaction Count	Average Sale
.0000 <= .10	41521958.70	29858	                1390.65
.1001 <= .20	17921756.80	615	                29141.07
.2001 <= .30	22384650.62	490	                45682.96
.3001 <= .40	15449160.66	240	                64371.50
.4001 <= .50	10416627.45	125	                83333.02
.5001 <= .60	8452682.22	82	                103081.49
.6001 <= .70	4923353.28	41	                120081.79
.7001 <= .80	1116949.70	8	                139618.71
.8001 <= .90	489622.37	3	                163207.46
.9001 <= 1.0	540019.13	3	                180006.38

Does this tell us a lot? Actually, it does tell us a few things. First, most of our transactions are small relatively speaking with a few large outliers in the data. If we really wanted to analyze this in some meaningful fashion we’d have to further segment the range. But this post really isn’t about analysis, it’s more about CASE, so I’m going to add an additional designation to the data so the ranges are clearer.

USE adventureworks2012;
DECLARE @range decimal(18,2);

SELECT @range = MAX(totaldue)
FROM sales.salesorderheader;

    WHEN totaldue / @range <= .10 THEN 0*@range
    WHEN totaldue / @range <= .20 THEN .1001*@range
    WHEN totaldue / @range <= .30 THEN .2001*@range
    WHEN totaldue / @range <= .40 THEN .3001*@range
    WHEN totaldue / @range <= .50 THEN .4001*@range
    WHEN totaldue / @range <= .60 THEN .5001*@range
    WHEN totaldue / @range <= .70 THEN .6001*@range
    WHEN totaldue / @range <= .80 THEN .7001*@range
    WHEN totaldue / @range <= .90 THEN .8001*@range
    ELSE CAST(.9001*@range AS INT) 
END AS beginRange,
FROM sales.salesorderheader
cte1 AS
SELECT beginRange, 
CAST(SUM(totaldue) AS DECIMAL(18,2)) AS [Total Due], 
COUNT(totaldue) AS [Transaction Count], 
CAST(AVG(totaldue) AS DECIMAL(18,2)) AS [Average Sale]
FROM cte
GROUP BY beginRange
SELECT CAST(beginRange AS INT) AS beginRange,
CAST(LEAD(beginRange) OVER (ORDER BY beginRange) - 1 AS INT) AS endRange,
[Total Due],
[Transaction Count],
[Average Sale]
FROM cte1
ORDER BY beginRange;

beginRange	endRange	Total Due	Transaction Count	Average Sale
0	        18766	        41521963.87	29858	                1390.65
18767	        37515	        17921756.81	615	                29141.07
37516	        56264	        22384650.61	490	                45682.96
56265	        75012	        15449160.67	240	                64371.50
75013	        93761	        10416627.45	125	                83333.02
93762	        112510	        8452682.28	82	                103081.49
112511	        131259	        4923353.25	41	                120081.79
131260	        150008	        1116949.69	8	                139618.71
150009	        168756	        489622.37	3	                163207.46
168757	        NULL	        540019.12	3	                180006.37

There is a lot going on here so lets go through it.

  1. The first thing I change was the MIN and MAX to a @range variable because since there were no negative values in the data I could just start at 0 and work my way up.  Hence, all I needed was a single variable passed into it by the MAX aggregate function.
  2. There are to CTE’s in the query.  The first uses a CASE function, with the @range value to calculate the beginning of each data range.
  3. The second CTE exists because I didn’t want to do a second CASE function for the ending value in each data range.  Instead I use it to aggregate the data from the first CTE into a result set that will allow me to use the LEAD function in a final query that unites all of this.
  4. I converted the beginRange and endRange values to INT types because it looks better.  In real-life I’d probably do the same with the “Total Due” and the “Average Sale” columns.
  5. As mentioned earlier I used the LEAD function to get the endRange value instead of a CASE function.

Once again, I don’t think this specific query is all that useful.  It’s too broad across the entire data range when it would really need to focus on the lower ordertotal transactions.  However, this kind of analysis can be useful.  Very large, or very small orders can distort data analysis in unexpected ways.  For instance, a sales rep can have his sales margin destroyed because he writes a large sale at a low profit.  If your commission structure has a gross profit calculation, it is even possible that he could even lose money on a sale that benefits the company.

Finally, this query is missing the most useful thing, profit margin.  I didn’t include it because the adventureworks2012 data requires joins to get at the cost information and this process was complex enough without it.  What I really wanted to show here is that it is possible to dynamically write a GROUP BY to group data based on the data it’s grouping.

Or should I say, discretize our data?

05 Mar

Discretization And Group By

From Wikipedia,

In mathematics, discretization concerns the process of transferring continuous models and equations into discrete counterparts.

In other words, when someone talks about how a television show did great in the all-important, 25-44 market, they are practicing discretization.  Of course in the BI world, the harder and more complex we make things sound, the better.

Or, to put it another way, discretization = GROUP BY.  Yeah, I know, it’s used in SSAS, and it’s really much more involved than this, but, for the purposes of this article, and a few that follow it, it’s a GROUP BY.

So lets discretize some data (tell me that doesn’t sound cooler than putting data into categories) by date range.

>USE adventureworks2012
SELECT YEAR(orderdate) AS orderDate, 
CAST(SUM(totaldue) AS DECIMAL(18,2)) AS totalDue, 
COUNT(*) AS recordCount,
CAST(AVG(totaldue) AS DECIMAL(18,2)) AS averageSale
FROM sales.salesorderheader
GROUP BY YEAR(orderdate)
ORDER BY orderDate

orderDate	totalDue	recordCount	averageSale
2005	        12693250.63	1379	        9204.68
2006	        34463848.44	3692	        9334.74
2007	        47171489.55	12443	        3791.01
2008	        28888197.51	13951	        2070.69

Much to my surprise, this discretization was useful, because as the number of orders increased the average sales volume decreased, substantially. If you didn’t know that the numbers were completely made up you could reasonably conclude that they sold themselves out of business.

Enough about discretization.  I just wanted to use it.  I’ll go back to speaking plain GROUP BY in my next post.

02 Mar

Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft® SQL Server® 2012

Note: This isn’t really a review. It’s more of an example, one of many, of what I’m finding in the book.  Also, I’m mellowing on the book’s content.  It’s been useful, unlike the exercises.

I’ve been studying for 70-463 and frankly it’s been going slower than the earlier two, much slower. In fact, it’s been a real struggle to study for, so much so that I’m finding excuses to not study. What’s been especially frustrating is that I thought I knew something about data warehousing and SSIS but I struggle to get even basic things done with the exercises in this book. Well, it turns out, after a lot of digging, that maybe it isn’t all me. Here’s one example from the book.

Chapter 13, titled “Troubleshooting and Performance Tuning”, which is all good, as it fits right in with the first exercise and will actually happen. It will break and you won’t know why.

Exercise 1 consists of the following steps.

  1. Install missing database objects, assuming you didn’t install them earlier, by executing the chapter 5 code scripts to build them for the TK463 DW database.
  2. Open the Chapter 13 project in the starter kit.
  3. Execute the package to retrieve an error.
  4. Look in the data flow to find the error.
  5. Look on the progress tab. It also lists the specific error.
  6. Then it takes you through a couple of more steps to fix the error.

Here is what actually happens.

  1. The package crashes. Good!
  2. The error that comes up is not the error in the lesson 13 instructions which is supposed to be about truncating tables and a derived column transformation. Instead, you get this.

Error 5 DimCustomerNew.dtsx Validation error. Dim Customer: Dim Customer: Opening a rowset for “[stg].[SalesTerritory]” failed. Check that the object exists in the database.

This is a hard crash and there is nothing you can do about it. You can’t even get to the progress tab because the package won’t confirm. At this point I attempted the following:

  1. Tried both the starter and the completed package. Same error.
  2. Ran the chapter 5 setup script, several times, because, just maybe it would work the 4th time after not working the first three.
  3. Searched the PDF for a script to create the [stg].[SalesTerritory]. There were no references with it in a CREATE TABLE script.
  4. Executed every SQL script from chapter 5 to chapter 13. Chapter 13 had the script to build the stg.Salesterritory table.

You are probably wondering why I tried everything else first. Well, the instructions don’t mention the chapter 13 code folder, anywhere, they mention chapter 5. Second, this kind of thing has happened to me in chapter 5, chapter 7, chapter 10, chapter 4 was a fight but I got it to work. In fact, it’s happened so often that more than once I’ve thrown the book in frustration and walked away from a chapter. My assumption was that I had skipped a step in an earlier exercise or chapter out of frustration.

Nope, the starter code and the solution code, just like in a bunch of other chapters, is defective. So I run the Chapter 13 scripts and guess what happens.

The package runs perfectly. Everything turns green without error.

Normally that would be great news, except for one thing, it’s supposed to generate an error.

I gave up on the exercise at that point.

When I look at the earlier books, and some of the frustrations I had with them, I have to say that they pale in comparison to this. I would take the 70-461 book, rearrange every chapter in random order, instead of just the couple where they seemingly did exactly that, and it would still be light years ahead of this book.

Oh, and what could make this even better? Exercise 2, where everything runs without error, appears not to produce any results after you run the code. That’s probably a product of exercise 1 not working correctly, although without error, in some fashion.

If it were only chapter 13 I could move on but I find every exercise a struggle to complete. Even the one’s that work need multiple passes because it’s easy to miss a step where the author decided to combine multiple steps in an illogical fashion. By comparison, Knight’s 24 Hour Trainer, is incredibly easy to follow, has similar exercises in many cases, is clearly written, and almost everything works on the first pass.

I have no idea if the content in the book is enough to pass the test, I’m struggling to get past the exercises, and if it is I’ll revise my rating of the book, but the exercises are a -5.

Final note: This book is running 3-1/2 stars from 6 reviews on Amazon. Some of the reviewers gave it 5 stars. If this doesn’t make you distrust reviewers on Amazon I don’t know what would. I guess maybe they are just looking at the content and not doing the exercises but the exercises themselves are as sloppily put together as anything I’ve seen in a book.

You can get the book at the link below, and you likely will if you are studying for this test, because it’s the only one out there. Just know going in, it will be unpleasant if you decide to do the exercises.

Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft® SQL Server® 2012