01 Jan

Sticky: The SSIS Section Disclaimer

I wanted to make sure that anyone reading this section understands what it is.  This is my study process for SSIS.  I take a topic, write about it, and learn it through that process.   These posts are written very fast and cover a lot of ground in a very short time frame (about 5 of these a day).  In many ways, this is like writing a book at an incredibly fast pace, without an editor.  I guarantee that there are errors and grammar issues because there simply isn’t enough time to keep up my pace.

Finally, this is test preparation, so I’m not attempting to delve into every detail of a topic.  If it helps you, great, I hope it does, but, if it’s not as deep as you’d like, there are other options available.

Note: Test passed on 03/19.  However, the backlog of content will run through 04/27 which was the last post (SSIS: Package Deployment) that I wrote as test preparation.  I’m now non-disclosed so I won’t be writing posts that say things like “I think this will be on the test” going forward.  Nor, will I say if any of it was, or wasn’t, on the test.

17 Apr

SSIS Toolbox: Row And Percentage Sampling

These are, mostly, two-sides of the same coin (I can sure turn a phrase).  What they do is take a sample of a set of data based on a number or a percentage.

Row Sampling

  • Specify a number of rows from the data set.
  • Rows are randomly selected.  However, a seed can be used which will generate consistent results from the same input.
  • The output from the transformation consists of selected records and unselected records.

Percentage Sampling

  • Specify a percentage of the result set.
  • Rows are randomly selected.  However, a seed can be used which will generate consistent results from the same input.
  • The output from the transformation consists of selected records and unselected records.

As you can see they are pretty close to the same thing.

One interesting suggestion from the MSDN was to use Row Sampling to pick 10 winners from a lottery.

Finally, the 70-463 Training Kit recommends using 30% as a best practice for Percentage Sampling.

16 Apr

SSIS: Parents and Children (Execute Package task)

SSIS allows you to run a package from another package using the Execute Package task.  This can also be thought of as a parent > child relationship.  It’s easiest, for me at least, to do one of these.

Step 1: Create a child package.

Script Task

Script Editor.

Here we go, again.

  1. Create a new project, I named mine ChildProject, and save it somewhere handy.  I used C\testData.
  2. I renamed the package to childPackage.
  3. Create a new variable.
    1. Right-click the Control Flow and select variables.
    2. Click the add button.
    3. Name the variable varChild, set the data type to String and the Value to “It’s not fair!!!!!!”.
  4. Drag a Script task to the Control Flow and do the following:
    1. Double-click it to open it.
    2. ScriptLanguage: “Microsoft Visual Basic 2010”.
    3. ReadOnlyVariables: User::varChild.
    4. Click the “Edit Script” button.
    5. Replace

      ‘ Add your code here

      with
      MsgBox(Dts.Variables(“varChild”).Value)
    6. Save the script and close the editor.
  5. Click OK to close the Script Task Editor.

msgboxIf you run the package at the point you should see a message box like the one on the right.

Now, save the package somewhere handy.

Step 2: Create a parent package.

  1. Create a new project, I named mine ParentProject, and save it somewhere handy.  I used C\testData.
  2. I renamed the package to parentPackage.
  3. Create a new variable.
    1. Right-click the Control Flow and select variables.
    2. Click the add button.
    3. Name the variable varParent, set the data type to String and the Value to “You’ll understand when you are a parent.”.
  4. Drag a Execute Package task to the Control Flow.
  5. Double-click the Execute Package task to open it.
  6. In the package tab do the following:
  7. Create a new variable.
    1. ReferenceType: External Reference.
    2. Location: File System.
    3. Connection: New Connection and point it to the child package.  Mine was at:
      C:\testData\ChildProject\ChildProject\childPackage.dtsx
  8. Click OK to close the Execute Package task
msgbox

Just like in real life, it won’t be used just once.

If you run this package the child package will return the same message in the message box.

Step 3: Configure the child package to use parent variables.

Both packages should now  be open in SSDT.  Do the following in the childPackage.dtsx Control Flow.

  1. Right-click the desktop and select “Package Configurations”.
  2. Check “Enable package configurations”.
  3. Click Add and configure it as follows:
    1. Configuration type: Parent package variable.
    2. Parent variable: varParent.
    3. Click Next
    4. Value Option.

      Value Option.

      Select Value for the varChild variable.  It should be the very last option in the objects column.

    5. Click Next.
    6. Click Finish.
  4. Click Close.

Now save both packages.

You can close the childPackage.dtsx package at this point but SSDT will simply reopen.

And, one other note, don’t do like I just did and spell the varParent variable VarPerent.  The child will just scream “It’s not fair!!!!!!” at you as you look for the error in the package.

understanding

Parenting wisdom.

So, that is a basic package configuration that passes a variable from one package / project to another.

I think this leaves only the Slowly Changing Dimension task for a write-up.  I’ve also covered, albeit lightly, most of the ways a package can be run and how they can interact with other packages.  Is the end finally in sight?

Useful links:

http://msdn.microsoft.com/en-us/library/ms137609.aspx

15 Apr

Data Warehousing Table And Design Basics

Star Schema.

Star Schema.

This is super basic stuff.  I mostly wrote it to make sure it was 100% clear in my head before I take the test.  Anyways, this will consist of the two schema designs used in data warehouses and three tables that are, or were, used in them.

First the tables.  There are three that might come up on the test.

Fact Tables

A fact table is the centralized hub in both a snowflake and star schema.  Fact tables have the following traits:

  • It consists of keys and measures.
  • They rarely contain descriptive data (in regards to testing, the answer is probably, never).
  • The data is additive.
  • Typically has a large number of rows..
  • You would probably, almost certainly, benefit from a columnstore index on your fact tables.
  • Do not connect to other fact tables.

Dimension Tables

Dimension tables connect to fact tables.  Where there is one fact table there is typically several dimension tables connected to it.  While a dimension table can actually be used by more than one fact table it is not, typically, the central point of the design.

  • They contain descriptive information related to fact tables.
  • They are hierarchical.
  • The surrogate key should almost always be an integer with an identity property and the primary key.
  • Slowly changing dimensions are maintained here.
  • A dimension table may connect to more than one fact table.

Aggregate Tables

If you see one of these on the 70-463 test you can probably assume it is probably an incorrect answer.  Microsoft, when describing them is emphatic that cubes have replaced them.

Anyways, an aggregate table is table that summarizes data in a fact table.

There are typically two schema used in data warehouses.

Star Schema

A star schema is simply a Fact table surrounded by dimension tables.  Because of the way it looks, a single central point, with single dimensions around it, it appears in a sense, like a star, hence, its name.  A star schema has the following traits:

  • It is easy to query with a relatively straight-forward joining process.
  • Can be harder to maintain relative to a snowflake.
  • Relatively easy to read and understand unlike a normalized database.
  • Often very fast.
  • Designed for data warehouse use.
  • Is denormalized.
  • It will almost certainly use less space.

Snowflake Schema

A snowflake schema is similar to a star schema in that there is a centralized fact table with dimensions surrounding it.  But, a snowflake’s dimensions are often normalized into several tables.  They have the following traits:

  • Normalized data is often easier to maintain from normalized systems.
  • More joins are necessary to write queries.
  • They are partially normalized.

Other notes

I found a number of references saying that a star schema works better with analytical tools (SSAS), however, Microsoft would seem to disagree with them.  Per Microsoft’s literature in the link below:

Analysis Services reads from a snowflaked dimension as well as, or better than, from a star dimension.

Dates are stored in a separate dimension.  However, if you need information that is more detailed than the date, say an order’s creation time, that should be modeled into the fact table.

Useful links

http://msdn.microsoft.com/en-us/library/aa902672(v=sql.80).aspx#sql_dwdesign_dimension

http://en.wikipedia.org/wiki/Snowflake_schema

http://en.wikipedia.org/wiki/Star_schema

14 Apr

SSIS: Installation

This isn’t a how-to post, instead like some of my other recent posts, it’s an overview of the installation process.  Specifically, things I think I might see on the test.  And, since there is a note in the Skills Measured section that says “Install and maintain SSIS components” I figure I should know at least this much.

Service Accounts

The following is a list of account types used with SSIS.

 Virtual AccountManaged Service Account (MSA)Domain Account
Definition.Is a managed local account, that does not use a password, and accesses resources based on a computer identity.Managed domain accounts that have automatic password word management. Usable on only a single server.Standard user account.
When to use.SSIS installed on a stand-alone server or on a domain controller.If SSIS requires access to external resources.
PlatformsServer 2008 R2, Windows 7 or SSIS is installed on a 2008 R2 failover cluster.Server 2008 R2 or Windows 7.Windows Vista or Server 2008.

http://technet.microsoft.com/en-us/library/dd548356%28v=ws.10%29.aspx
Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft® SQL Server® 2012

Installation

  • You can install SSIS on a computer that has no previous instances of SQL Server.  Duh!
  • You can install SSIS 2012, side-by-side with 2005 and 2008.

http://technet.microsoft.com/en-us/library/ms143731.aspx

Upgrading

This isn’t an upgrade in the sense that it installs 2012 side-by-side with earlier versions.  However there are things you need to know.

You cannot do the following;

  • Reconfigure an existing installation.
  • Move from a 32-bit to a 64-bit version of SQL Server or from a 64-bit version to a 32-bit version.
  • Move from one localized version of SQL Server to another localized version.

Upgrading also does not migrate existing packages to the new format.

http://technet.microsoft.com/en-us/library/cc879336.aspx

Upgrading Existing Packages

There are several techniques you can use.  The good news, they are pretty obvious.

UpgradeResults
dtexec utilityAble to run prior SSIS package versions. Does not permanently upgarde.
Open the package in SSDT.Permanent if you save the package.
Add the package to an existing project.Permanent if you save the package.
Use the SSIS Package Upgrade Wizard Permanent.

http://technet.microsoft.com/en-us/library/68dbdf81-032c-4a73-99f6-41420e053980

32-bit vs. 64-bit

Installing on a 64-bit system does the following:

  • When you install SQL Server and select Integration Services for installation, Setup installs all available 64-bit Integration Services features and tools.
  • If you require Integration Services design-time features, you must also install SQL Server Data Tools (SSDT).
  • If you require 32-bit versions of the Integration Services runtime and tools to run certain packages in 32-bit mode, you must also install SQL Server Data Tools (SSDT).

Development tools, however, are 32-bit.  This is the crazy making part and why you have to set the Run64bitRuntime to  false.

Also, according to the MSDN:

64-bit features are installed under the Program Files directory, and 32-bit features are installed separately under the Program Files (x86) directory. (This behavior is not specific to Integration Services or to SQL Server.)

http://technet.microsoft.com/en-us/library/ms143731.aspx

This is short by my terms.  Unfortunately, because it was mostly pasted from source data, I probably won’t remember it.

13 Apr

Master Data Services: Installation

I’m honestly not going to spend a lot of time on MDS but I am going to fire a few posts at it.  My hope, and the chance that I’m going to take, is that I won’t get hit with any hard-core configuration issues.  If the test in some fashion asks me to do something step-by-step I’ll just have to hope I get lucky.  My hope is to get it to the point that I can get some, maybe most, questions on it, right.

So, onto installation.  Be careful here, there are a lot of steps to it.

  1. Run SQL Server Setup.

And that’s that.  However, there are a number of requirements to use it.

  • Microsoft Silverlight 5 is required.  The application will prompt you to install it.
  • It only runs on 64-bit editions of SQL Server 2012 Developer, Business Intelligence Edition and Enterprise.
  • You must install IIS 7, 8 or 9.
    • This requires a handful of things be configured.  You can see the full list here:
      http://msdn.microsoft.com/en-us/library/ee633744.aspx
    • If you are using Windows 7, which I do most of the time, you cannot install Windows Authentication, which means you cannot install MDS.  You will need to either upgrade to  Windows 7 Ultimate, use a virtual server, or find a SQL Server installation on Microsoft Server.
  • .NET 3.51 or greater.

The MDS add-in for Excel is installed separately.  You download it from Microsoft.  There are two versions: 32-bit and 64-bit.  Unfortunately, I have not tested this.  My virtual server doesn’t have Excel and the system that does, my desktop, is only Windows 7.

Finally, you can integrate Data Quality Services with MDS but you have to install it, and the Data Quality Client, on the same instance of SQL Server that you have MDS installed on.

12 Apr

SSISDB Security

SSISDB is that folder over on the left, in SSMS, under “SSIS Integration Catalogs”.  Security here isn’t all that different from other places in SQL Server, in the sense that it has roles, and permissions, but, it is different.  To access permissions you navigate into the SSISDB folder, right-click on a folder or object, select Properties and then the Permissions tab on the left.

The permissions tab allows you to Grant or Deny permissions but you can also Revoke permissions with a stored procedure (gonna hope that doesn’t show up).  There are up to 9 permissions depending on where you are in the hierarchy.

NameWhat I hope it does.
ReadCan read the object's properties but not for other objects contained in the object.
ModifyCan modify the object's properties but not for other objects contained in the object.
Manage PermissionsCan assign permissions to objects.
Create ObjectsCan create objects in the folder. (Folder only)
Modify ObjectsCan modify objects in the folder. (Folder only)
Execute ObjectsCan execute objects in the folder. (Folder only)
Read ObjectsCan read objects in the folder. (Folder only)
Manage Object PermissionsCan manage permissions on all objects in the folder.
(Folder only)
ExecuteExecute all packaged in the project.
The source for this table is the Microsoft 70-463 Training Kit.

One thing to note is that packages inherit permissions from their containing project and are not assigned to them directly.

Any SSISDB user can deploy projects.  And any objects created by a user (including environments and operations, but mainly projects) can read, modify or execute (projects only) that object.

Don't push the Red Button!

Don’t push the Red Button!

There is a single role for the SSISDB catalog: ssis_admin.  It’s your typical admin role in that it can do anything to any project in the catalog.  The default user for this role is the user who created the catalog.

Finally, users must be granted access to the SSISDB catalog before they can use it.  Upon creation no users will have access to it.

My primary source was the Microsoft 70-463 Training Kit.  I have to be fair and admit that I’m developing more respect for that book.  Parts of it are actually quite good and this is a topic that I couldn’t find in the msdn.  I’m going to have to do another review on that book if this keeps up.

11 Apr

SSIS Roles

The right save option.

The right save option.

SSISDB is a different topic.  I’ll do that in the next post.

SSIS roles apply to packages saved in the msdb database and they are mostly the same as roles in SQL Server.  Only because this confused me the first time I looked at it, you save a package to the msdb database by doing the following:

  1. If you aren’t in the package deployment model you need to click the Project > Convert to package deployment model from the menu and run through the wizard.
  2. Go to File > Save Copy of <packagename> as.  This is different from the option right above it.
  3. I won’t go through the options but the one you want is “Package location” and you need to set it to “SQL Server”.

Now, with that done, there are 3 roles:

RolesReadWrite
db_ssisadmin
or
sysadmin
View all packages.
Execute all packages.
Export all packages.
Execute all packages in SQL Agent.
Import packages.
Delete all packages.
Change all package roles.
db_ssisltduserView own packages.
Execute own packages.
Export own packages.
Import packages.
Delete own packages.
Chage own package roles.
db_ssisoperatorView all packages.
Execute all packages.
Export all packages.
Execute all packages in SQL Server Agent.
None

The link below was the basis for most of this article.  Honestly, I think I could have just linked to it and saved the 30 minutes it took to write this.

http://msdn.microsoft.com/en-us/library/ms141053.aspx

10 Apr

SSIS: Package Templates

Templates, to me at least, don’t look like that big of a deal.  You could, simply save packages, and add them to projects, but, it’s exactly the type of thing that might show up on the 70-463, so, once more, into the breach.

Templates are super easy to create and use.  And, to make things easier we’re going to create a super easy one.

Step 1: Create a package.

By now you know how to hit the File > New button or to create a new package when you open SSDT so go ahead and do that.  Drag two Data Flow tasks to the Control Flow and connect them.

Step 2: Save the package.

This is the hardest part because there is some bad information out there.  Click “File > Save Integration Services Project <name> As” and save it to the following directory.

Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\PrivateAssemblies\ProjectItems\DataTransformationProject\DataTransformationItems

Danger!  Danger!

Danger! Danger!

I saved the package as myTestPackage.  You may need to tweak your permissions.  My solution was to run SSDT as an administrator (I’m running this on Windows 7) as the easiest path through the muck was as an administrator.

Close the project.

Step 3: Load the template.

First, do not do what I tried, and that is to use the File option on the menu.  It will lead to a problem, it won’t work.  Do this instead:

  • Project > Add New Item
  • Right click on the project name in the solution explorer and select Add > New Item.

When you do either of those you should see your package.  In my case that was myTestPackage.

myTestPackageIf you are running SQL Server 2005, or 2008, you will need to create a new GUID.  However, happily I’m running 2012 so it’s done automatically.

This link has the save paths for earlier editions.

http://www.mssqltips.com/sqlservertip/2841/creating-ssis-package-templates-for-reusability/

 

09 Apr

SSIS: Package Configuration

I’m hoping this goes quickly.  It’s pretty clean once we get the first package configured.  Essentially, all you do is configure the package, create a configuration file, and roll, baby, roll.

Configuration files come in 5 flavors:

  • XML configuration file: It’s stored in XML.  It can contain multiple configurations.
  • Environment variable: Stored in an environment variable.
  • Registry entry: Stored in a registry entry.
  • Parent package variable: Contained in a variable.  It’s usually used to update child packages.
  • SQL Server table: It can include multiple configurations.

And, for this article, there isn’t a lot more than that going on.  So let’s do one.

Step 1: Create a new package and/or project, if need be.

You should know this drill by now (File > New

Step 2: Create a OLE DB Source.

  1. Drag a Data Flow task to the Control Flow, or just click the Data Flow tab and hit the link.
  2. Drag a OLE DB Source to the Data Flow.  Configure it as follows:
    1. OLE DB Connection: Create one if you need to and point it to AdventureWorks2012.
    2. Data Access Mode: SQL Command.
    3. SQL Command Text:
      SELECT YEAR(orderdate) as yrDate, SUM(totaldue) as totaldue
      FROM sales.salesorderheader
      WHERE YEAR(orderdate) = ?
      GROUP BY YEAR(orderdate)
    4. Click the Parameter box and configure it as follows:
      1. Click New Variable.
      2. Name: varYear.
      3. Type: Int32.
      4. Value: 2007.
      5. Click OK.
  3. Click OK.

Now, we take the easy way out.  Drag a UNION ALL to the Data Flow, connect the OLE DB Source to it, and enable the data viewer (double-click the path, select the Data Viewer tab, and check the “Enable data viewer” box.

If you run the package you should get the following results:

yrDate	totaldue
2007	47171489.546

Step 3: Configure the package.

You may need to convert to package configuration at this point.  To do that go to Project > Convert to package deployment, and follow the wizard.  You won’t need to do anything but Next or OK.

Properties to export.

Properties to export.

Now, we need to create a package.  Do the following:

  1. Click SSIS > Package Configurations.
  2. Check “Enable package configuartions.
  3. Click Add.
  4. Click Next.
  5. Select Configuration Type: XML configuration file.
  6. Configuration file name: Click the ellipse, and put it somewhere you can find again.  I used C:\testData and named the file myconfig.  Click Save.
  7. Click Next.
  8. Navigate to the bottom of the objects list and select the varYear variable.  It should select the entire variables path.  Note:  I tried selecting everything to export and it fried the XML configuration file by converting everything to its HTML equivalent < = <.  I was able to read it in Internet Explorer but unable to edit it.  I do not have a solution for this other than to not select everything.
  9. Click Next.  Click Finish.  Click Close.

If you navigate to the folder you save the config file to you should see your file.  Edit it with a text editor.  I used Notepad++.

The XML file will likely be one long line.  Use the Find option in your editor and search for 2007.  We’re cheating but there will be more than one varYear in the file so this is easier.  Change 2007 to 2005, save the file, and run the package.  You should get the following result:

 

yrDate    totaldue

2005    12693250.6264
</pre>

 

And there you go, a working package configuration.

Obviously you can do a lot more than this with configuration files but I just wanted to run something through the system and make sure I had it nailed down.

This link had more information on package configurations.

http://msdn.microsoft.com/en-us/library/ms141682.aspx

08 Apr

SSIS: Executing A Project With T-SQL

This should be the last of the Execution series.

SSIS: Executing A Project With SQL Agent

SSIS: Deploying A Project And Executing It In SSMS

SSIS: Parameters With Expressions (it got messy)

So, here’s the deal.  You can execute a package with T-SQL.  It’s a bit messy, but doable.  We’ll be using the same set up and package as we had in those earlier links.  For me that means a folder at C:\testData, a folder at C:\testData\backup and a file called junk.txt in C:\testData.

packageinfoSince there isn’t much to do except write some code, I mean, copy some code, so here goes,

DECLARE @execution_id BIGINT;
DECLARE @logging_level INT = 1;

EXEC catalog.create_execution
@folder_name = N'ParameterTest',
@project_name = N'Project Deployment Demo',
@package_name = N'myFileMove.dtsx',
@execution_id = @execution_id OUTPUT;

EXEC catalog.set_execution_parameter_value
@execution_id = @execution_id,
@object_type = 50,
@parameter_name = N'LOGGING_LEVEL',
@parameter_value = @logging_level;

EXEC catalog.start_execution
@execution_id;

I think this is clean enough that I don’t need to explain it.  You can read more at the links below.  One parameter value to keep in mind for create_execution is the @use32bitruntime parameter.  If you set it to 1 the package will run in 32-bit mode, otherwise, it will run in 64bit mode.  It is optional.

I really hope this isn’t something else to memorize for the 70-463.  There are a handful of other parameters, that are optional, and it would be easy to forget them.  Especially since I’m not going to memorize them in the first place.

You can read more at these links.

catalog.create_execution

catalog.set_execution_parameter_value