01 Mar

Creating An Aggregate View With A Clustered Index

An aggregate view can be an excellent alternative to summary tables, especially with a clustered index. However, there are a few quirks that show up in SQL Server so I thought I’d create a simple one and plow through a litany of errors.

SELECT column_name, data_type, character_maximum_length
FROM information_schema.columns
WHERE table_name='militaryspending';

column_name	data_type	character_maximum_length
country	        varchar	        50
spending	float	        NULL
gdpperc	        float	        NULL
region	        varchar	        30

Our first try.

CREATE VIEW vw_spending AS
SELECT region, SUM(spending)
FROM militaryspending
GROUP BY region;

Msg 4511, Level 16, State 1, Procedure vw_spending, Line 2
Create View or Function failed because no column name was specified for column 2.

Alright, we need an alias for the aggregate.

IF object_id('vw_spending','V') IS NOT NULL
	DROP VIEW vw_spending;
GO
CREATE VIEW vw_spending AS
SELECT region, SUM(spending) as totSpending
FROM militaryspending
GROUP BY region;
GO
SELECT *
FROM vw_spending

region			totSpending
Africa			120608
Asia			428477.7
Australia		28559.1
Europe			312412.2
North America	        743120.4
South America	        65976.9

OK, that part worked, but we also need to add a clustered index if we want all the benefits of an aggregate view.

CREATE CLUSTERED INDEX idx_vwspending ON vw_spending (region);

Msg 1939, Level 16, State 1, Line 9
Cannot create index on view 'vw_spending' because the view is not schema bound.

Oops, OK, time for some schemabinding and recreating the view.

IF object_id('vw_spending','V') IS NOT NULL
	DROP VIEW vw_spending;
GO
CREATE VIEW vw_spending 
WITH SCHEMABINDING AS
SELECT region, SUM(spending) as totSpending
FROM militaryspending
GROUP BY region;
GO
CREATE CLUSTERED INDEX idx_vwspending ON vw_spending (region);

Msg 4512, Level 16, State 3, Procedure vw_spending, Line 3
Cannot schema bind view 'vw_spending' because name 'militaryspending' is invalid for schema 
binding. Names must be in two-part format and an object cannot reference itself.
Msg 1088, Level 16, State 12, Line 1
Cannot find the object "vw_spending" because it does not exist or you do not have 
permissions.

Two errors!!!! Come on!!!!!

THis is really easy to fix, despite it being two errors, we just need to add the schema to the table name.

IF object_id('vw_spending','V') IS NOT NULL
	DROP VIEW vw_spending;
GO
CREATE VIEW vw_spending 
WITH SCHEMABINDING AS
SELECT region, SUM(spending) as totSpending
FROM dbo.militaryspending
GROUP BY region;
GO
CREATE CLUSTERED INDEX idx_vwspending ON vw_spending (region);

Msg 1941, Level 16, State 1, Line 1
Cannot create nonunique clustered index on view 'vw_spending' because only unique 
clustered indexes are allowed. Consider creating unique clustered index instead.

Alright, alright, I’ll add make the index unique.

IF object_id('vw_spending','V') IS NOT NULL
	DROP VIEW vw_spending;
GO
CREATE VIEW vw_spending 
WITH SCHEMABINDING AS
SELECT region, SUM(spending) as totSpending
FROM dbo.militaryspending
GROUP BY region;
GO
CREATE UNIQUE CLUSTERED INDEX idx_vwspending ON vw_spending (region);

Msg 10138, Level 16, State 1, Line 1
Cannot create index on view 'WingTip.dbo.vw_spending' because its select list 
does not include a proper use of COUNT_BIG. Consider adding COUNT_BIG(*) to select list.

Seriously? We have to add a made up aggregate, to add an aggregate? Yep!

I did a quick Google on this but I couldn’t determine why it’s necessary. The best explanation I found, and it could certainly be true, is that the COUNT_BIG aggregate helps the optimizer keep track of the number of rows in the view. Why not a regular COUNT? I don’t know. The only different between COUNT, besides this example, is that one returns a bigint and the other a regular old int. The msdn for BIG_COUNT doesn’t say a thing about it.

Anyways, one more time, with feeling.

IF object_id('vw_spending','V') IS NOT NULL
	DROP VIEW vw_spending;
GO
CREATE VIEW vw_spending 
WITH SCHEMABINDING AS
SELECT region, SUM(spending) as totSpending, COUNT_BIG(*) AS countBig
FROM dbo.militaryspending
GROUP BY region;
GO
CREATE UNIQUE CLUSTERED INDEX idx_vwspending ON vw_spending (region);

Msg 8662, Level 16, State 0, Line 1
Cannot create the clustered index "idx_vwspending" on view "WingTip.dbo.vw_spending" because 
the view references an unknown value (SUM aggregate of nullable expression). Consider 
referencing only non-nullable values in SUM. ISNULL() may be useful for this.

This can be fixed in a couple of ways. First, you can set the spending column to not accept NULL values. I was lazy bringing it over so I allowed it to accept NULL. Or, you could do what the error says and add ISNULL to the query which is what I’m going to do here.

IF object_id('vw_spending','V') IS NOT NULL
	DROP VIEW vw_spending;
GO
CREATE VIEW vw_spending 
WITH SCHEMABINDING AS
SELECT region, SUM(ISNULL(spending,0)) as totSpending, COUNT_BIG(*) AS countBig
FROM dbo.militaryspending
GROUP BY region;
GO
CREATE UNIQUE CLUSTERED INDEX idx_vwspending ON vw_spending (region);
GO
SELECT *
FROM vw_spending;

region			totSpending	countBig
Africa			120608		39
Asia			428477.7	28
Australia		28559.1		3
Europe			312412.2	36
North America	        743120.4	9
South America	        65976.9		11

And there ya go. Finally, we have an aggregate view with a clustered index off of a casually designed table.

11 Jan

Golden State Flooring

gsfsiteNote: The site I built and describe here is currently dead. The company has gone into bankruptcy and I have no association with the current group calling themselves Golden State Flooring. Their site is not the site that I worked on for years. The best I can offer as far as viewing it is to go to the Internet Archive and view it there. Unfortunately, only about a third of the pages work.

If I can get a copy of the original site, I’ll post it here.

Golden State Flooring is a website that served flooring dealers and contractors throughout California. I was responsible for, well, everything on it, except, for it’s current design. I did the original designs and, I’ll admit, design is not my strong suit.

Specifically, the site provided customer credit information, pricing, promotions and other items for customers. The back-end allowed for an account to be limited in it’s access to certain product lines, or, for instance, a credit person to see a customer’s aging and purchase information but not allow a salesperson to see it. The site and the database running it also tied into the Higgins Hardwoods site. In other words, a user could log into either site and see, mostly, but not always, the same information. Both companies used the same ERP, and shared some customers, so it made sense to tie the two together in that way. Today, those sites would be built around Joomla or Drupal, and my life would have been a bit easier in that regard. But I started working on these in 2003, built the back end somewhere around that time, and, well, those two products weren’t as well done as they are today.

One question I always get asked was, did you do E-Commerce? The answer is a resounding No. And there are a couple of solid reasons for that. First, we didn’t sell directly to the public. We sold to flooring dealers and contractors or cabinet shops (Higgins). So giving the appearance that we sold online, especially in the flooring game, was problematic.

The second reason we didn’t sell online is that our data was a mess. The ERP did a very poor job of product attributes. Basically, they didn’t have any, so we didn’t have any. Flooring, which has at least a dozen potential attributes (species, width, length, warranty, color, stain, etc) had all of that information stored in a SKU description field where Red Oak might be “RO”, “Red Ok”, “Red Oak”, “Rd Oak” and so forth. As such, every SKU would have had to have been recreated with a cross-reference outside the ERP and the will to do such a thing was lacking.

We did come close, building an Overstock section, which took months, and as near as I can tell is broken, that allowed a visitor to request more information on a specific SKU.

Finally, the sites are LAMP based and supported around 2,500 registered users. That number, while small by many standards, represented significant market penetration in the flooring dealer and contractor business. I’m amazed at how well we did, most of which is owed to a very determined vice-president who pushed the sales team to sign people up.

11 Jan

Tales From The Field: The Numbers Didn’t Help

A few years ago I was talking to our senior credit dude, he’d like that name, and one of our yards came up in discussion. It seemed likely that the manager was padding his sales at the end of a period to reach sales goals. In essence, he was likely padding his sales in the last week of a period with sales that weren’t really sales. I won’t go into specifics as it doesn’t really matter that much but the real question is why we didn’t catch it.

First, we had all kinds of reporting at that point. We had daily sales totals that would clearly have highlighted the spike. We had exception reporting to capture irregularities. Realistically, we should have caught it. Yet, it wasn’t caught until years later by the credit team. Here’s why.

1) The company had a long history of “the last working day of the month is the best day”. For instance, if we were averaging $500K/day we’d regularly double that at the end of the month. Simply put, we had sloppy billing practices that hid the problem. We should have been billing everything on the same day, as they do now, instead of billing it the next day or later in the month.

2) Management managed managers differently which caused them to overlook the problem. A manager’s pay was based strictly on the bottom line of the yard they ran. This had all kinds of benefits and problems. On the one hand it encourage these guys to sell but it also encouraged them to do things like not fix gaping forklift holes in the warehouse, or, in this case it caused people to focus on the wrong thing. Because the yard in question was one of our lesser performers getting it to make a profit was a big deal. When they did it got everyone excited and they overlooked a problem that was screaming in their face.

3) The credit team. We had credit managers who worked at each yard. In other words, they worked for the yard manager. If a manager was playing games it took a credit manager with a fair amount of “you know what’s” to take it up a level. Some of them were tough enough to do it but in this specific case that was a no go. Our credit team did eventually catch the problem but only much later after the invoices had gone far too long without having been paid. Credit managers needed to report to someone at the corporate level.

For those with TLDR: In the end it’s still a people problem no matter how good the numbers are.

11 Jan

Tales From The Field: It’s There But, Really, It Isn’t

I don’t talk about this much but at one point in my life I was an internal auditor and a inventory accountant. Yup, I used to add up numbers and I can still 10-key. So, I thought it would be interesting to talk about a few things I ran across in that world.

The Cans Go Thump

Back in the day I was brought in as part of a team to count inventory at several yards. I wound up with a small yard that was flooring orientated and so it sold a lot of glue, stains and, well, things that were perishable. I don’t remember how it worked out, I think the manager let it slip, but a good part of their inventory had gone bad. You’d hit the can, and instead of a nice pleasant liquid sound you got a thunk. This led me to doing a “thump test” whenever I did an audit. My favorite response came from a VP who called me up, he was quite mad, and wanted to know what he was supposed to do with the bad inventory as it cost money to dispose of it. I doubt he really cared about the disposal fees, he was really mad that writing off $30K of inventory, that the manager had to know was bad, meant a big hit on the numbers.

TLDR version: The inventory was bad, we couldn’t sell it and we wouldn’t write it off.

I wanna count X, 4 hours later, I need to cancel that count.

This is kind of old school, but back in the day our system required that an inventory count be set up at the corporate level. Well, I had one yard that would call me up to count a specific section of inventory, and they would always, cancel it later in the day.

It’s kind of hard to explain but back then they had to count 100% of their inventory dollars not their SKU’s. Further, there were certain products, because of the nature of the beast, that would always show a gain, and I mean always. Well what this yard would do is count material that had gains 4 or 5 times a year and never count the parts that had a loss. And because they’d hit 100% of the count all was good.

For the record, they actually maintained their inventory fairly well. When we did 100% inventory counts on them their total inventory was reasonably good. They were just playing the game in a way that benefited them the most.

Ah, there it is

We used to do an annual audit of certain yards. Now, in the lumber business, if you aren’t really in it, like me, for instance, you can’t really tell one species of wood from another. So, when the auditor showed up, he had 0.00% of knowing what was what. So, a manager who couldn’t find something, that he was sure he had (yeah, right), would point to a stack of wood, and say, “there it is”. They’d count it, tag it, and magically most of the inventory was there.

I had a manager actually confirm to me that he’d done that. This manager actually ran a pretty clean inventory and I guess if he was honest enough to admit that to me….

11 Jan

My Reporting Structure At A Prior Company

For a recent job interview I put together a short 2-page summary of the things I’d done at Higgins. Mainly this is a data flow diagram showing where the data started and ended. I was involved in all of it (ok, not every single piece, I had an IT manager who liked command line programming) but for the most part I did all of the design, programming, testing and implementation for our reporting.

Reporting (PDF)

What follows is an explanation that I sent to a hiring manager.

The main business ERP was developed by a company called Activant (now Epicor) Falcon. It was a lumber based ERP (used by Higgins and Golden State) that was developed in and ran on Progress (the database and language are tied together in many ways). I don’t have the total database size but it would have been several hundred gigabytes. This is where the majority of our data was stored. The Pella side of the business was much smaller and did most of their analysis in an Access database. This was merged with our data.

Data was pulled from the Falcon ERP in multiple ways and used in multiple systems as follows:

  • Ad-hoc Reporting using Cyberquery – This was standard reporting (Excel, PDF, command line, automated emails, etc.).
  • Cyberquery hold files – These were flat-file based data extracts that were very fast. We used these extensively.
  • Cyberquery data extracts – These were extracted CSV files that were eventually imported into our intranet (Horde application framework – LAMP based) for reporting, CRM (purchasing used SugarCRM), Sharepoint (KPI’s displayed there) and web-based reporting for our on-line customers.
  • We did occasionally use an ODBC connection to access this data directly from PHP but it was awkward so in general we stuck to exporting it into MySQL.

We moved the data around, for the most part, using UNIX/Linux commands (rsync, cat, mysqlimport). Cyberquery also allowed you to create macros which ran AIX commands.

This data wound up in two main interfaces (company websites and our intranet which ran on the Horde application framework). The following are all LAMP based except for some minor reporting done using SQL Server and ASP.

  • Our company websites (www.goldenstateflooring.com and www.higginshardwoods.com) where customers could see their statements, invoice history, payment history and view product information on our blowout sku’s. Everything on those sites was designed and built by me except the appearance.
  • One-Stop Reporting (see the attachment for more information). The concept behind One-Stop was to take multiple reports from our ERP (which often had very poor indexing) and combine them into a single interface (LAMP based) improving efficiency and speed (in some cases reducing a report from hours to seconds). It was focused on four specific areas: purchasing, sales, credit and customer data. For instance, One-Stop Sales allowed you to select from 15+ parameters (inside rep, outside rep, warehouse, sku) and run 6 different reports based around some of the same factors. So it was possible to see sales by rep by warehouse, sales by rep for a given sku, sales by inside rep for a sku, etc. Data was updated once/day.
  • There was also customized sales reporting for the outside sales team, including automated emails, that allowed a salesperson to view all of his sales, customer and credit information. Management could access it to see a salesperson’s performance.
  • We also did some work with exporting data to Sharepoint and SugarCRM. The latter was used by purchasing and we exported PO data to it and the former was for some simple KPI’s for individual warehouses.

This is a bit long but it does clarify the process we were using.

11 Jan

Pricing In A Lumber Yard

So, I haven’t written anything in a while, which means it’s time to dust off an old concept: pricing.  I’m going to come at this from a really narrow focus, namely the lumber industry, and how hard it can be to set a price for a product.

Several years ago a committee/group was formed to set pricing for our product lines.  The thinking being that a) our salespeople drop their pants to make a sale b) our salespeople take the easy way out and give our customers whatever price they gave them last time or c) we’ll be able to better control our margins if we have a consistent mark up across the board.  Let’s take a quick look at each of those before I hit the meat of just how hard it is to price lumber at the corporate level.

First, salespeople do make deals that we wish they wouldn’t and management does need to police them.  It’s a simple fact and I’ve seen plenty of examples where things happened that shouldn’t have.  So, A is certainly a risk.  It’s also true that sales reps will give a customer the same price as they got the last time they came in.  While this can kill you in an inflationary market, it’s also true that customers expect your prices to follow some small level of consistency.  Holding your prices consistent can help that but it’s still not a compelling enough reason, in my opinion, to allow salespeople to set prices.  The last item on the list, consistent margins, needs to be addressed after we talk about lumber.

Lumber is a unique product in that every single transaction is unique.  Each of the following, in no particular order, can be true for any single lumber sale.

<ol>
<li>The quality of two 100’ piles of wood can be radically different.  Even the wood in each individual pile can be different.  Even the same species, and grade, can be different depending on where the lumber was milled.
<li>Competition is fierce.  In the lumber industry you can face anything from a big distributer, to Home Depot, to a guy selling it out of a truck, to a small business willing to sell it just to get some cash to keep the doors open.
<li>Sometimes it’s not the SKU, it’s a collection of SKU’s.  For instance, you might have a customer who feels he’s getting ripped off if he doesn’t get a species at X price.  But, he’ll give you 35% margin on other things he buys so he’s really a 28% margin account.
<li>Some customers pay well and don’t quibble.  Other’s fight you tooth and nail over every detail, cherry pick the good wood out of an order, return the rest, and want a 2% discount for paying you in 90 days.
<li>Different regions will demand different products at different prices.
</ol>

When I look back at the group I realize that about the best you could ever do was to set a guideline for lumber pricing.  It’s simply not possible, or reasonably effective, for a corporate entity to try and control pricing in this type of business.  But we tried.  Processes were put in place to track how often a salesperson landed on a break and of course there were repercussions and I believe it hurt all participants.

Previously, I mentioned controlling margins as the third argument.  One of the interesting things, at least where I worked, is that salespeople actually did make some effort to price accordingly.  You could see them adjusting prices upwards as the costs rose.  Maybe they didn’t move as fast as you wanted, or as effectively, but they did adjust prices relative to the market.  They also adjusted prices based on quantity.  The great mystery to me, and management of course “knew” the answer, was what was happening in the middle.  Were we leaving money on the table because we were too slow to adjust prices relative to what we could gain by controlling the price?  Honestly, I have no idea because I never saw the end of the test.  My best guess is that we lost more than we gained.  I think that because our margins rose as the economy tanked, our sales didn’t (hard to correlate that to anything but the economy) and if I’m certain of anything our competitors margins did not rise.

Finally, how do I think we should have handle pricing on these products?  I think we should have set up pricing in the system like we did.  I also think management should have reviewed every transaction in the system to look for oddities.  We had tools in place which made that incredibly simple and they should have been strongly encouraged to review those with the reps.  And, never, in a million years, should we have hammered the reps the way we did for going off break.  I think the latter led to shrugs and lost sales.

Overall, it was a very interesting process to watch and it’s a topic that really deserves more nuance than the two pages I just wrote gave it.  Hope someone enjoys it.

11 Jan

Something I posted on the MySQL Forums

Something I put on the MySQL Newbie forums. The original thread is here. You’ll need to read it to understand what the original poster was trying to. Basically, she wanted to calculate a due date based on a duration from a start date, but, and it was a big but, using an exception table that excluded certain dates (such as holidays and weekends.

Normally, due dates are calculated from a simple starting date to whenever the due date is. Something like date + 30. And when I’ve done exceptions in the past it was excluding dates in a period, such as calculate the number of days in a month less holidays and weekends which basically firstday – lastday – exception days. This was something different.

So here’s the post. It may be amended if someone comes up with something better.

Now it’s time for some SAS.

=========================================

I have a streak of madness in me and turned this into a personal project. The code below doesn’t do the UPDATE but it does calculate the ending date.

In case someone else takes this on, the sample results are incorrect, 3 out of 4, in fact, if I understood the rules correctly.

The first thing I tried was a cursor in a stored procedure. It did work but it felt clunky, messy and sure didn’t belong in the newbie section. If it was just for me, I’d have used a cursor, just to get the project done but I was sure you could do it without it.

So, for whatever it’s worth, here is how I did it.

First, you need a third table.

CREATE TABLE table_03 (alldate date);

Populate it with all the dates you’ll ever need beginning at the first date you’ll use (2011-12-01 in this example).

Here is the SQL,

SELECT table_01_id, duration, startdate, enddate
FROM
(select table_01_id, start_date startdate, t2.skip_date, t3.alldate enddate, t1.duration duration,
@row_rank := IF(@current_row = table_01_id, @row_rank + 1, 1) AS row_rank,
@current_row := table_01_id r1
FROM table_01 t1
LEFT JOIN table_03 t3
ON t3.alldate > t1.start_date
LEFT JOIN table_02 t2
ON t3.alldate = t2.skip_date
WHERE t2.skip_date is null ) a1
WHERE duration = row_rank

This produces, this, based on the sample data:

+-------------+----------+------------+------------+
| table_01_id | duration | startdate  | enddate    |
+-------------+----------+------------+------------+
|           1 |        5 | 2011-12-01 | 2011-12-09 |
|           2 |        7 | 2011-12-03 | 2011-12-14 |
|           3 |        2 | 2012-12-01 | 2012-12-03 |
|           4 |        7 | 2012-12-03 | 2012-12-10 |
+-------------+----------+------------+------------+

What it does is produce a running total of correct count dates and then compares that with the duration for each record in the WHERE clause. When it matches we’re good.

And, credit where credit is due, I stole the counter from here: http://forums.mysql.com/read.php?10,505528,505581#msg-505581

Good luck with it and I want cookies, a bunch of cookies.

11 Jan

SQL Stuff: Period Comparisons

If you are like me, and you were at some point in your career, you probably needed to produce sales totals for a comparison period. For instance, January 2004 compared to January 2003. The easy, cheesy way to do that is something like the following.

Note: This is being written in MySQL using the Northwind database and my fields don’t always mean what they do in the database.

SELECT YEAR(orderdate) ‘Year’, format(sum(totaldue),2) Sales
FROM orderheader
WHERE YEAR(orderdate) in (2003,2004) and MONTH(orderdate) = 1
GROUP BY YEAR(orderdate)

+------+--------------+
| Year | Sales        |
+------+--------------+
| 2003 | 2,233,575.11 |
| 2004 | 3,691,013.22 |
+------+--------------+

Problem solved, right? Except, what happens if it’s a long list and you want to compare period to period. So, you want a column for 2003 and a column for 2004, with totals for each? In other words, this won’t cut it.

SELECT YEAR(orderdate) ‘Year’, MONTH(orderdate) ‘Month’, format(sum(totaldue),2) Sales
FROM orderheader
WHERE YEAR(orderdate) in (2003,2004)
GROUP BY YEAR(orderdate), MONTH(orderdate)

+------+-------+--------------+
| Year | Month | Sales        |
+------+-------+--------------+
| 2003 |     1 | 2,233,575.11 |
| 2003 |     2 | 3,705,635.50 |
| 2003 |     3 | 2,611,621.26 |
| 2003 |     4 | 3,041,865.44 |
| 2003 |     5 | 4,449,886.23 |
| 2003 |     6 | 3,257,517.70 |
| 2003 |     7 | 4,681,520.64 |
| 2003 |     8 | 6,775,857.07 |
| 2003 |     9 | 6,762,753.81 |
| 2003 |    10 | 4,243,366.59 |
| 2003 |    11 | 5,961,182.68 |
| 2003 |    12 | 6,582,833.04 |
| 2004 |     1 | 3,691,013.22 |
| 2004 |     2 | 5,207,182.51 |
| 2004 |     3 | 5,272,786.81 |
| 2004 |     4 | 4,722,890.74 |
| 2004 |     5 | 6,518,825.23 |
| 2004 |     6 | 6,728,034.99 |
| 2004 |     7 | 56,178.92    |
+------+-------+--------------+

You could also add WITH ROLLUP to get totals for each year.

SELECT YEAR(orderdate) ‘Year’, MONTH(orderdate) ‘Month’, format(sum(totaldue),2) Sales
FROM orderheader
WHERE YEAR(orderdate) in (2003,2004)
GROUP BY YEAR(orderdate), MONTH(orderdate) WITH ROLLUP;

+------+-------+---------------+
| Year | Month | Sales         |
+------+-------+---------------+
| 2003 |     1 | 2,233,575.11  |
| 2003 |     2 | 3,705,635.50  |
| 2003 |     3 | 2,611,621.26  |
| 2003 |     4 | 3,041,865.44  |
| 2003 |     5 | 4,449,886.23  |
| 2003 |     6 | 3,257,517.70  |
| 2003 |     7 | 4,681,520.64  |
| 2003 |     8 | 6,775,857.07  |
| 2003 |     9 | 6,762,753.81  |
| 2003 |    10 | 4,243,366.59  |
| 2003 |    11 | 5,961,182.68  |
| 2003 |    12 | 6,582,833.04  |
| 2003 |  NULL | 54,307,615.09 |
| 2004 |     1 | 3,691,013.22  |
| 2004 |     2 | 5,207,182.51  |
| 2004 |     3 | 5,272,786.81  |
| 2004 |     4 | 4,722,890.74  |
| 2004 |     5 | 6,518,825.23  |
| 2004 |     6 | 6,728,034.99  |
| 2004 |     7 | 56,178.92     |
| 2004 |  NULL | 32,196,912.42 |
| NULL |  NULL | 86,504,527.50 |
+------+-------+---------------+

The NULL columns represent totals for the years and the grand total. This is courtesy of the WITH ROLLUP modifier.

And, now, back to our problem. What I did, and what others have certainly done was write two queries, one for each period, and jam them together in the application. This gives new meaning to the P’s in PHP (hint: it’s something Mike Rowe often works with). Instead of doing that, here is an alternative, which is very fast.

SELECT MONTH(orderdate) ‘Month’,
FORMAT(SUM(IF(YEAR(orderdate)=2003,totaldue,0)),2) as ‘2003 Sales’,
FORMAT(SUM(IF(YEAR(orderdate)=2004,totaldue,0)),2) as ‘2004 Sales’
FROM orderheader
WHERE YEAR(orderdate) IN (2003,2004) — Proof that I can comment code.
GROUP BY MONTH(orderdate) WITH ROLLUP;

+-------+---------------+---------------+
| Month | 2003 Sales    | 2004 Sales    |
+-------+---------------+---------------+
|     1 | 2,233,575.11  | 3,691,013.22  |
|     2 | 3,705,635.50  | 5,207,182.51  |
|     3 | 2,611,621.26  | 5,272,786.81  |
|     4 | 3,041,865.44  | 4,722,890.74  |
|     5 | 4,449,886.23  | 6,518,825.23  |
|     6 | 3,257,517.70  | 6,728,034.99  |
|     7 | 4,681,520.64  | 56,178.92     |
|     8 | 6,775,857.07  | 0.00          |
|     9 | 6,762,753.81  | 0.00          |
|    10 | 4,243,366.59  | 0.00          |
|    11 | 5,961,182.68  | 0.00          |
|    12 | 6,582,833.04  | 0.00          |
|  NULL | 54,307,615.09 | 32,196,912.42 |
+-------+---------------+---------------+

This does a single select, it’s very fast, and if someone ever reads this, I hope it helps you.

And, some bonus coverage. This is how you would do the same thing in Cyberquery.

DEFINE var2003 = IF YEAR(orderheader:orderdate) = 2003 THEN totaldue
DEFINE var2004 = IF YEAR(orderheader:orderdate) = 2004 THEN totaldue

WHERE
YEAR(orderheader:orderdate) ONE OF “2003”,”2004″

SUM
var2003/HEADING=”2003 Sales”
var2004/HEADING=”2004 Sales”

BY
MONTH(orderheader:date)/heading=”Month”

10 Jan

My Goal Is To Figure Out Who Built The Goal Table In Your Database

One of the first jobs that rolled my way in 2005 was to set up a method for tracking sales goals and the sales relative to them. They also wanted to present it to users in our Horde portal. I did some digging, found that the ERP had a small little application that tracked goals and sales, so I figured that all I would have to do is find the table where they stored the goals, extract the data, and we would be good.

So, I found the table and fired a basic query against it and here’s roughly what came back for a single sales rep.

Year Sales
2005  
2005 300000
2005 300000
2005 300000
2005 375000
2005 375000
2005 375000
2005 425000
2005 425000
2005 425000
2005 375000
2005 375000
2005 375000

Anyone, whose done any of this kind of work can spot the problem in half a second, “how do you know which of those records apply to which month?” Well, the answer is that the second record was January, and the third February, and so on and so forth. Yes, somewhere, along the way in the development process, someone, you’d think, would have said “wut”?

In order to use it, over on the database side, I had to get tricky and make sure the data export sorted correctly (something that shouldn’t have mattered) and then put counters in the code so I knew what was what. Had to make sure that the second record was January and all that. I managed to do it and it wound up working well enough.

Honestly, I have no idea why someone would build something like this except for one possible clue. In Progress there is an array data type. I really know nothing about it, except that it exists, and these values were in fact using that data type. In Cyberquery, in order to access January’s goal data you would write something like:

WHERE
slm_goal:year = 2005

LIST
slm_goal:sales[1]

Of course, you’d expect to write the WHERE clause as

WHERE
slm_date = 1.2005

But, I digress.

And there you had it, sales data for January 2005. Of course that’s still almost impossible to work with for anything meaningful outside of the system. But, Progress is a funky thing. As I understand it, and really I don’t, the programming language is also tightly integrated with the database. It may be that somewhere in that mess, someone saw this as an easy way to deal with some problem. It’s also possible that this piece of work was never intended to go into the mainstream as it was well-hidden and our main ERP support person didn’t even know about it. Still, it was sitting there on our system.

And the final straw, that ERP, wasn’t something you bought off newegg.com unless they’re selling $500K+ products these days. It was a legitimate ERP that was in a decent chunk of the industry it specialized in.

10 Jan

The Statements Don’t Add Up

There’s nothing like a bit of craziness to start off a website with, especially, when it’s in the context of a perfectly serious business environment.

A few years ago I was handed a statement for one of our largest customers, along with several bad words, from our flooring VP. I really didn’t have the time, as I was supposed to be working on websites, but I decided to take a look at it and see if I could make sense of it. The statement was 10 pages long, very convoluted, and the V.P. who had handed me the statement was correct, it would be hard for a customer to reconcile. Actually, it would be impossible to reconcile as the detail didn’t add up to the totals.

There’s nothing like a bit of craziness to start off a website with, especially, when it’s in the context of a perfectly serious business environment.

A few years ago I was handed a statement for one of our largest customers, along with several bad words, from our flooring VP. I really didn’t have the time, as I was supposed to be working on websites, but I decided to take a look at it and see if I could make sense of it. The statement was 10 pages long, very convoluted, and the V.P. who had handed me the statement was correct, it would be hard for a customer to reconcile. Actually, it would be impossible to reconcile as the detail didn’t add up to the totals..

Well, as it turns out, I was apparently the first person to ever add one of these things up (that’s not actually true as I’ll explain in a moment, but it’s close). No one had any idea of the true depths of the problem and no one wanted to deal with it as I learned while trying to bring the problem to the attention of those who logically should have been concerned that when it came to our statements 1+1 might equal anything. So, lets do this with a bit more detail and watch carefully because there are a lot of moving parts.

First, the totals on a customer statement (aging buckets and grand total) were taken not from the detail on the statement but from the customer master file (cust table).

Second, the totals in the customer master file were generated every time an aging was run for whenever the aging was run. So, if an aging was run for two years ago (likely what did happen in the example I looked at) guess what the customer master file buckets were updated to? Yup, two years ago!

By itself, this would be enough to think (how much did you pay for that system anyway) but it gets better. We had a long history of distributed statement distribution. In other words, each individual location mailed out their own statements, or didn’t. So what would happen is that at month-end we’d run a company-wide aging, which set the buckets, and the credit managers would come in on the 1st of the month, run statements, and mail them out. Had that been all that happened we’d have been fine but it was never that clean. Credit managers, prior to sending out statements, often ran an aging for customers during the day (especially on the 1st of the month) which reset the customer master file to the date that they ran the aging for. Then, after all of that, they would run the statements and happily mail them off and, well, most of them would add up.

Really, the only logical response to all of this is “this is insane”. And, I would agree.

Anyways, we eventually fixed all of it and for me it was a blessing as it got me into the BI side of the house. I’ve included a list showing all of the problems, and the solutions we implemented. I, honestly, doubt you will ever run into something this bad but if you do just know that you aren’t alone.

Problems:

  1. Statement totals come from customer master file not statement detail.
  2. An aging wrote totals to the customer master file.
  3. An aging, especially company-wide, took a very long time to run (a full aging could take a couple of hours)
  4. Statements hard to read regardless.

Solutions

  1. Run an aging during month-end (we’d always done this)
    Print statements to document imaging immediately after step 1. This prevented anything else from changing them.

  2. We changed the format of the statements, our vendor had more than one, to one that made more sense for our customers.
  3. Created a Cyberquery report (based on data pulled from the aging data during the night) that gave our credit managers a daily aging total within our ERP. This ran in a few seconds.
  4. Pulled this data into our intranet and built several web-based tools to view it and other data.
  5. Pushed this data to our websites so our customers could view both their credit history and an open item statement. This data included their current balance, historical transaction data and line item detail.