This is something I missed in the Sort transformation so I thought I’d make sure I nailed it.
At the bottom-left of the Sort transformation is a check box that reads “Remove rows with duplicate sort values”. Because it behaves in a way that could be slightly unexpected I thought I’d write a post.
Here’s a table and some data.
create table junk ( id int, myval char(1)); INSERT INTO junk VALUES (1,'A'),(1,'B'),(1,'C'),(2,'A'),(2,'A'),(3,'C'); SELECT id, myval FROM junk; id myval 1 A 1 B 1 C 2 A 2 A 3 C
Now, I’ll run some queries and set up the Sort to match them.
SELECT DISTINCT id, myval FROM junk; id myval 1 A 1 B 1 C 2 A 3 C
This matches this Sort transformation configuration.
However, the Sort transformation can do something a bit weird. If the configuration is the following:
You’ll wind up with this:
id myval 1 C 2 A 3 C
And that is probably not what you want. The Sort transformation will run a DISTINCT on an individual column and apply that across the entire result set.
Anyways, that’s really all I have to say on this. It’s just something I missed going through the study process and it’s exactly the kind of thing they might ask.