23 Apr

SSIS Toolbox: Sort – Remove rows with duplicate sort values

This is something I missed in the Sort transformation so I thought I’d make sure I nailed it.

At the bottom-left of the Sort transformation is a check box that reads “Remove rows with duplicate sort values”.  Because it behaves in a way that could be slightly unexpected I thought I’d write a post.

Here’s a table and some data.

create table junk
(
id int,
myval char(1));
INSERT INTO junk VALUES
(1,'A'),(1,'B'),(1,'C'),(2,'A'),(2,'A'),(3,'C');

SELECT id, myval
FROM junk;
id	myval
1	A
1	B
1	C
2	A
2	A
3	C

Now, I’ll run some queries and set up the Sort to match them.

SELECT DISTINCT id, myval
FROM junk;

id	myval
1	A
1	B
1	C
2	A
3	C

This matches this Sort transformation configuration.

sort

 

However, the Sort transformation can do something a bit weird.  If the configuration is the following:

sort1

 

You’ll wind up with this:

id	myval
1	C
2	A
3	C

And that is probably not what you want.  The Sort transformation will run a DISTINCT on an individual column and apply that across the entire result set.

Anyways, that’s really all I have to say on this. It’s just something I missed going through the study process and it’s exactly the kind of thing they might ask.

14 Mar

SSIS Toolbox: Pivot, Aggregate And Sort.

Initially I had wanted to do just do a simple PIVOT transform but it turned into more than that so I added a Aggregate transformation to the process, actually, two.  Lets get to it.

A Pivot transform works similarly to a PIVOT statement in T-SQL.  In fact, this is the statement we’re going to duplicate using SSIS.

 

WITH cte AS
(
select YEAR(orderdate) AS yrDate, 
MONTH(orderdate) AS moDate,
totalDue
from sales.salesorderheader
)
SELECT *
FROM cte
PIVOT (SUM(totalDue) for yrDate IN ([2005],[2006],[2007],[2008])) p
ORDER BY moDate

moDate	2005	        2006	        2007	        2008
1	NULL	        1462448.8986	1968647.184	3359927.2196
2	NULL	        2749104.6546	3226056.1486	4662655.6183
3	NULL	        2350568.1264	2297692.9898	4722357.5175
4	NULL            1727689.5793	2660723.7481	4269365.0103
5	NULL	        3299799.233	3866365.1263	5813557.453
6	NULL	        1920506.6177	2852209.8283	6004155.7672
7	1074117.4188	3253418.7629	3998942.7051	56178.9223
8	2292182.8828	4663508.0154	5712201.2697	NULL
9	1836827.5791	3638980.3689	5702087.8686	NULL
10	1518540.2014	2488758.6715	3767722.1252	NULL
11	3218764.4696	3809633.4035	5250314.3052	NULL
12	2752818.0747	3099432.1035	5868526.2471	NULL

Really, there’s no reason not to just put this in the OLE DB source and go from there, but, since this is an SSIS article were going to do it there.

Step 1: Create a OLE DB source

  1. Create a new project and package in SSIS.  Name them whatever you like.
  2. Drag a Data Flow task to the Control Flow and double-click it to edit it.
  3. Drag a OLE DB source to the Data Flow and configure it as follows:
    1. OLE DB connection manager: AdventureWorks2012.  If necessary click new and create it.
    2. Data access mode:  SQL Command
    3. SQL Command Task: Use “SELECT YEAR(orderdate) AS yrDate, MONTH(orderdate) AS moDate,totalDue
      FROM sales.salesorderheader”
    4. Click the Columns tab to make sure everything mapped correctly.
    5. Click OK to close the OLE DB source.
Aggregate transform.

Aggregate transform.

Step 2: Create an Aggregate transformation

  1. Drag an Aggregate transformation to the Data Flow and connect the OLE DB source to it.
  2. Double-click the Aggregate transformation.  Check yrDate, moDate and totalDue.
  3. In the operation column, set tolalDue = SUM, yrDate = GROUP BY and moDate = GROUP BY.
  4. Click OK to close the transformation.

Step 3: Create a Pivot transformation

  1. Drag a Pivot transformation to the Data Flow and connect the Aggregate transformation to it.
  2. Pivot Transformation.

    Pivot Transformation.

    Double-click the Pivot transformation to open it and configure it as follows:

    1. Set Key: This is the verticall axis of the query.  Set it to moDate.
    2. Pivot Key: This is the horizontal axis of the query.  Set it to yrDate.
    3. Pivot Value: This is the data part of the query.  Set it to totalDue.
    4. Generate pivot output values from columns.  This is, for lack of a better term, the X-Axis label.  Use [2005],[2006],[2007],2008] for this section.
    5. Click “Generate Columns Now” which will alias the columns for you.
    6. Click OK to close the transformation.
  3. Show Advanced Editor.

    Show Advanced Editor.

    Clean up the header results.

    1. Right-click the Pivot transformation and select “Show Advanced Editor”.
    2. Click on the “Input and Output Properties” tab.
    3. Drill down though “Pivot Default Output” > “Output Columns”.
    4. Click on C_2005_totalDue and navigate to the Name field in the right section.  Change the name to 2005.
    5. Repeat step 4 for the 2006 through 2008 values.

The Pivot transformation is different from the T-SQL version in that it doesn’t aggregate the values. It just pivots the data so you need to do a second Aggregate transformation to make it all the way to the finish line.

Aggregate 2.

Aggregate 2.

Step 4: Create a final Aggregate transformation

  1. Drag an Aggregate transformation to the Data Flow and connect the Pivot transformation to it.
  2. Double-click the Aggregate transformation.  Check yrDate, moDate and totalDue.
  3. In the operation column, set tolalDue = SUM, yrDate = GROUP BY and moDate = GROUP BY.
  4. Click OK to close the transformation.

Step 4 is actually identical to step 2.

Step 5: Create a Sort transformation to guarantee sort order.

This likely isn’t necessary but it gets one more transform out of the picture so here it is.

Sort transformation.

Sort transformation.

  1. Drag a Sort transformation to the Data Flow and connect the second Aggregate to it.
  2. Double-click the Sort transformation to open it.
  3. Check the moDate box in the “Available Input Columns.
  4. You can accept the defaults in the next section but this is where you can set the Sort Type (ascending or descending) and the Sort Order a numerical value for the sequence.  Lower numbers are sorted first.
  5. Click OK to close the Sort transformation.

Finally, add a UNION ALL and connect the second Sort transformation to it.  Double-click the path, select the “Data Viewer” tab and check the “Enable data viewer” box. If we’ve both done everything right you should have a working Pivot and Aggregate Data Flow series.

Finished package.

Finished package.

This is one of the easier transformation but it had a, for me at least, non-intuitive piece in the way it transforms.  I struggled with  it a bit.  I also had something got out of whack earlier in the transform when I started working with the package.  I was overwriting an existing package and somehow, behind the scenes, something was hung up.  It would be wise to always start a new package, in a new package.