11 Mar

SSIS Toolbox: Data Flow Task

The Data Flow task is essentially where all the work is done in an ETL process.  It’s in a Data Flow task that you do the extracting, transforming and loading that your heart could desire.

Adding real value to the discussion.

Adding real value to this post.

A Data Flow is a stand alone process in that you can drag one to the control flow, execute the package and it will run.  There are 3 ways to create a data flow task:

  • Drag it to the desktop from the toolbox.
  • Double-click it in the toolbox.
  • Click on the Data Flow tab and then hit the “Click here to add a new Data Flow Task”.

You can have multiple Data Flows in a package. I couldn’t find a limit on how many you can have in a package but I can safely say that you can put at least 100.  I didn’t connect them but I can confirm they ran.

You can rename a task in 3 ways:

  • Clicking on it’s name.
  • Selecting the task and hitting F2.
  • Using the name property in the properties tab.

In a Data Flow task you can right-click and select from a menu of choices.  Most of them are straight-forward, and better served in a separate post.  They are,

  • annotationVariables: Access the variables window.
  • SSIS Toolbox: Access the toolbox.
  • Getting Started: Access the getting started box.
  • Add Annotation: This allows you to put a note in the designer.  Essentially it’s a comment.
  • Execute Task: Executes the current Data Flow task.  Very handy for debugging and testing.
  • Properties: Access the properties section.  Also accessible with the F4 key.

You access  Data Flow Tasks either by double-clicking it to open it or clicking on the drop-down box in the Data Flow window.

There is only a single exit arrow (green) from a Data Flow constraint.  The precedence constraint options for a Data Flow task are:

  • Evaluation Operation: It can be a constraint, expression, “expression and constraint” or “expression or constraint”.
  • Value: Applies to constraint option.  It can be success, failure or completion.
  • Expression: Applies to the Expression option and brings up the expression editor.
  • Multiple Constraints: Specifies how the constraints behave.  The options are ALL (every constraint evaluates to true) and OR (at least one constraint evaluates to true).

There are a whole bunch of properties for the Data Flow task.

  • BLOBTempStoragePath: Location where you store BLOB files.  Use a semicolon for multiple locations.
  • BufferTempStoragePath: Location for temporary storage of buffer data.  Use a semicolon for multiple locations.
  • DefaultBufferMaxRows:  Sets the row buffer size in number of rows.  Default is 10,000.
  • DefaultBufferSize: Sets the buffer size.  Defaults to 10MB.
  • DelayValidation: Delays whether validation is delayed until run time.  It’s useful when you know you design has an error that will prevent validation.  For instance, a resource isn’t available.
  • Description: Description of the package.
  • Disable: Use to disable the package.
  • DisableEventHandler: Disables event handlers for the package.
  • EngineThreads: Number of threads the package can use.  Default is 10.  The minimum is 3.  This is a recommendation, not an absolute.  So it could, in fact, use more than 10 if the setting is 10.
  • ExecValueVariable: Allows you to select a package variable that stores the execution result of the task.
  • Expression: Can be used to assign an expression, and a value, to a property in the task.
  • FailPackageOnFailure: Fails the entire package if the task fails.
  • FailParentOnFailure: Fails the parent if the task fails.
  • ForcedExecutionValue: Forced value that specifies the optional execution value of the package.
  • ForceExecutionValueType: Type of variable for the execution value.
  • ForceExecutionResult: The execution result of the package.  It can be none, success, failure and completion.
  • ForceExecutionValue: Specifies whether forced execution value should contain a value.
  • HasExpressions: True if using expressions.
  • ID:  GUID of the object.
  • IsolationLevel: Specifies the transaction isolation of the container.
  • LocaleID: Defaults to the local of the operating system.
  • LoggingMode: Can set to parent properties, disabled or enabled.
  • MaximumErrorCount: Number of errors to fail the package.  Default is 1.
  • Name: Name of the package.
  • PackagePath: Where the package is stored in the file system.
  • RunInOptimizedMode: Removes unused columns, outputs and components from the data flow.
  • TransactionOption: Enables the transaction level of the task.

That is a lot of property values to write and they’ll be repeated going forward so I’ll be typing this all again at some point.  Fun!

And did I learn something doing this?  Yep, the ForceExecution properties were  new to me.