01 Mar

Creating An Aggregate View With A Clustered Index

An aggregate view can be an excellent alternative to summary tables, especially with a clustered index. However, there are a few quirks that show up in SQL Server so I thought I’d create a simple one and plow through a litany of errors.

SELECT column_name, data_type, character_maximum_length
FROM information_schema.columns
WHERE table_name='militaryspending';

column_name	data_type	character_maximum_length
country	        varchar	        50
spending	float	        NULL
gdpperc	        float	        NULL
region	        varchar	        30

Our first try.

CREATE VIEW vw_spending AS
SELECT region, SUM(spending)
FROM militaryspending
GROUP BY region;

Msg 4511, Level 16, State 1, Procedure vw_spending, Line 2
Create View or Function failed because no column name was specified for column 2.

Alright, we need an alias for the aggregate.

IF object_id('vw_spending','V') IS NOT NULL
	DROP VIEW vw_spending;
GO
CREATE VIEW vw_spending AS
SELECT region, SUM(spending) as totSpending
FROM militaryspending
GROUP BY region;
GO
SELECT *
FROM vw_spending

region			totSpending
Africa			120608
Asia			428477.7
Australia		28559.1
Europe			312412.2
North America	        743120.4
South America	        65976.9

OK, that part worked, but we also need to add a clustered index if we want all the benefits of an aggregate view.

CREATE CLUSTERED INDEX idx_vwspending ON vw_spending (region);

Msg 1939, Level 16, State 1, Line 9
Cannot create index on view 'vw_spending' because the view is not schema bound.

Oops, OK, time for some schemabinding and recreating the view.

IF object_id('vw_spending','V') IS NOT NULL
	DROP VIEW vw_spending;
GO
CREATE VIEW vw_spending 
WITH SCHEMABINDING AS
SELECT region, SUM(spending) as totSpending
FROM militaryspending
GROUP BY region;
GO
CREATE CLUSTERED INDEX idx_vwspending ON vw_spending (region);

Msg 4512, Level 16, State 3, Procedure vw_spending, Line 3
Cannot schema bind view 'vw_spending' because name 'militaryspending' is invalid for schema 
binding. Names must be in two-part format and an object cannot reference itself.
Msg 1088, Level 16, State 12, Line 1
Cannot find the object "vw_spending" because it does not exist or you do not have 
permissions.

Two errors!!!! Come on!!!!!

THis is really easy to fix, despite it being two errors, we just need to add the schema to the table name.

IF object_id('vw_spending','V') IS NOT NULL
	DROP VIEW vw_spending;
GO
CREATE VIEW vw_spending 
WITH SCHEMABINDING AS
SELECT region, SUM(spending) as totSpending
FROM dbo.militaryspending
GROUP BY region;
GO
CREATE CLUSTERED INDEX idx_vwspending ON vw_spending (region);

Msg 1941, Level 16, State 1, Line 1
Cannot create nonunique clustered index on view 'vw_spending' because only unique 
clustered indexes are allowed. Consider creating unique clustered index instead.

Alright, alright, I’ll add make the index unique.

IF object_id('vw_spending','V') IS NOT NULL
	DROP VIEW vw_spending;
GO
CREATE VIEW vw_spending 
WITH SCHEMABINDING AS
SELECT region, SUM(spending) as totSpending
FROM dbo.militaryspending
GROUP BY region;
GO
CREATE UNIQUE CLUSTERED INDEX idx_vwspending ON vw_spending (region);

Msg 10138, Level 16, State 1, Line 1
Cannot create index on view 'WingTip.dbo.vw_spending' because its select list 
does not include a proper use of COUNT_BIG. Consider adding COUNT_BIG(*) to select list.

Seriously? We have to add a made up aggregate, to add an aggregate? Yep!

I did a quick Google on this but I couldn’t determine why it’s necessary. The best explanation I found, and it could certainly be true, is that the COUNT_BIG aggregate helps the optimizer keep track of the number of rows in the view. Why not a regular COUNT? I don’t know. The only different between COUNT, besides this example, is that one returns a bigint and the other a regular old int. The msdn for BIG_COUNT doesn’t say a thing about it.

Anyways, one more time, with feeling.

IF object_id('vw_spending','V') IS NOT NULL
	DROP VIEW vw_spending;
GO
CREATE VIEW vw_spending 
WITH SCHEMABINDING AS
SELECT region, SUM(spending) as totSpending, COUNT_BIG(*) AS countBig
FROM dbo.militaryspending
GROUP BY region;
GO
CREATE UNIQUE CLUSTERED INDEX idx_vwspending ON vw_spending (region);

Msg 8662, Level 16, State 0, Line 1
Cannot create the clustered index "idx_vwspending" on view "WingTip.dbo.vw_spending" because 
the view references an unknown value (SUM aggregate of nullable expression). Consider 
referencing only non-nullable values in SUM. ISNULL() may be useful for this.

This can be fixed in a couple of ways. First, you can set the spending column to not accept NULL values. I was lazy bringing it over so I allowed it to accept NULL. Or, you could do what the error says and add ISNULL to the query which is what I’m going to do here.

IF object_id('vw_spending','V') IS NOT NULL
	DROP VIEW vw_spending;
GO
CREATE VIEW vw_spending 
WITH SCHEMABINDING AS
SELECT region, SUM(ISNULL(spending,0)) as totSpending, COUNT_BIG(*) AS countBig
FROM dbo.militaryspending
GROUP BY region;
GO
CREATE UNIQUE CLUSTERED INDEX idx_vwspending ON vw_spending (region);
GO
SELECT *
FROM vw_spending;

region			totSpending	countBig
Africa			120608		39
Asia			428477.7	28
Australia		28559.1		3
Europe			312412.2	36
North America	        743120.4	9
South America	        65976.9		11

And there ya go. Finally, we have an aggregate view with a clustered index off of a casually designed table.

11 Feb

PROC MEANS In T-SQL

SAS has a basic procedure called PROC MEANS.  It essentially takes a column of data and performs some basic statistical analysis on it.  Specifically it looks at the following data,

  • N – number of occurrences
  • Mean
  • Maximum
  • Minimum
  • Range (Maximum – Minimum)
  • Standard Deviation.

You can read more about PROC MEANS at this page: http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473539.htm.

Because this site is for my entertainment I figured I’d use the cake data as found in the above example and create that result set. It’s actually fairly easy to do as the above examples map to T-SQL functions quite easily as follows,

  • COUNT(presentscore) as N
  • AVG(presentscore) as Mean
  • MAX(presentscore) as Maximum
  • MIN(presentscore) as Minimum
  • MAX(presentscore) – MIN(presentscore) as Range
  • ROUND(STDEV(presentscore),4) as ‘Std Dev’

So to reproduce the output generated by the PROC MEANS example on that page all I need to do is the following,

SELECT 'PresentScore' as Variable, 
COUNT(presentscore) as N, 
AVG(presentscore) as Mean,
MAX(presentscore) as Maximum,
MIN(presentscore) as Minimum,
MAX(presentscore) - MIN(presentscore) as Range,
ROUND(STDEV(presentscore),4) as 'Std Dev'
FROM cake
UNION ALL
SELECT 'TasteScore', 
COUNT(tastescore), 
AVG(tastescore),
MAX(tastescore),
MIN(tastescore),
MAX(tastescore) - MIN(tastescore),
ROUND(STDEV(tastescore),4)
FROM cake

Variable	N	Mean	Maximum	Minimum	Range	Std Dev
PresentScore	20	76	93	56	37	9.3768
TasteScore	20	81	94	72	22	6.6116

There isn’t much to note here.  I build two queries, alias them, add a UNION ALL and call it a day.  Perhaps the most interesting thing is that it lets you build aggregates against a character value.  Otherwise, it’s all very straight-forward.

However, there is an even easier way to do this.

WITH cte AS
(
SELECT 'PresentScore' as Variable, PresentScore as value
FROM cake
UNION ALL
SELECT 'TasteScore', TasteScore
FROM cake
)
SELECT Variable as Variable,
COUNT(*) as N,
AVG(value) as Mean,
MAX(value) as Maximum,
MIN(value) as Minimum,
MAX(value) - MIN(value) as Range,
ROUND(STDEV(value),4) as 'Std Dev'
FROM cte
GROUP BY Variable;

The change is that we build the CTE with a union then select against it.  It’s much easier to work with.  This can also be written as an inline-view.

SELECT Variable as Variable,
COUNT(*) as N,
AVG(value) as Mean,
MAX(value) as Maximum,
MIN(value) as Minimum,
MAX(value) - MIN(value) as Range,
ROUND(STDEV(value),4) as 'Std Dev'
FROM (
SELECT 'PresentScore' as Variable, PresentScore as value
FROM cake
UNION ALL
SELECT 'TasteScore', TasteScore
FROM cake ) as A
GROUP BY Variable

I’ll be revisiting this in a later post but for right now you have a working simulation of PROC MEANS.