23 Apr

SSIS Toolbox: Sort – Remove rows with duplicate sort values

This is something I missed in the Sort transformation so I thought I’d make sure I nailed it.

At the bottom-left of the Sort transformation is a check box that reads “Remove rows with duplicate sort values”.  Because it behaves in a way that could be slightly unexpected I thought I’d write a post.

Here’s a table and some data.

create table junk
(
id int,
myval char(1));
INSERT INTO junk VALUES
(1,'A'),(1,'B'),(1,'C'),(2,'A'),(2,'A'),(3,'C');

SELECT id, myval
FROM junk;
id	myval
1	A
1	B
1	C
2	A
2	A
3	C

Now, I’ll run some queries and set up the Sort to match them.

SELECT DISTINCT id, myval
FROM junk;

id	myval
1	A
1	B
1	C
2	A
3	C

This matches this Sort transformation configuration.

sort

 

However, the Sort transformation can do something a bit weird.  If the configuration is the following:

sort1

 

You’ll wind up with this:

id	myval
1	C
2	A
3	C

And that is probably not what you want.  The Sort transformation will run a DISTINCT on an individual column and apply that across the entire result set.

Anyways, that’s really all I have to say on this. It’s just something I missed going through the study process and it’s exactly the kind of thing they might ask.

Leave a Reply

Your email address will not be published. Required fields are marked *