05 Mar

Discretization And Group By

From Wikipedia,

In mathematics, discretization concerns the process of transferring continuous models and equations into discrete counterparts.

In other words, when someone talks about how a television show did great in the all-important, 25-44 market, they are practicing discretization.  Of course in the BI world, the harder and more complex we make things sound, the better.

Or, to put it another way, discretization = GROUP BY.  Yeah, I know, it’s used in SSAS, and it’s really much more involved than this, but, for the purposes of this article, and a few that follow it, it’s a GROUP BY.

So lets discretize some data (tell me that doesn’t sound cooler than putting data into categories) by date range.

>USE adventureworks2012
SELECT YEAR(orderdate) AS orderDate, 
CAST(SUM(totaldue) AS DECIMAL(18,2)) AS totalDue, 
COUNT(*) AS recordCount,
CAST(AVG(totaldue) AS DECIMAL(18,2)) AS averageSale
FROM sales.salesorderheader
GROUP BY YEAR(orderdate)
ORDER BY orderDate

orderDate	totalDue	recordCount	averageSale
2005	        12693250.63	1379	        9204.68
2006	        34463848.44	3692	        9334.74
2007	        47171489.55	12443	        3791.01
2008	        28888197.51	13951	        2070.69

Much to my surprise, this discretization was useful, because as the number of orders increased the average sales volume decreased, substantially. If you didn’t know that the numbers were completely made up you could reasonably conclude that they sold themselves out of business.

Enough about discretization.  I just wanted to use it.  I’ll go back to speaking plain GROUP BY in my next post.

Leave a Reply

Your email address will not be published. Required fields are marked *