The Data Flow task is essentially where all the work is done in an ETL process. It’s in a Data Flow task that you do the extracting, transforming and loading that your heart could desire.
A Data Flow is a stand alone process in that you can drag one to the control flow, execute the package and it will run. There are 3 ways to create a data flow task:
- Drag it to the desktop from the toolbox.
- Double-click it in the toolbox.
- Click on the Data Flow tab and then hit the “Click here to add a new Data Flow Task”.
You can have multiple Data Flows in a package. I couldn’t find a limit on how many you can have in a package but I can safely say that you can put at least 100. I didn’t connect them but I can confirm they ran.
You can rename a task in 3 ways:
- Clicking on it’s name.
- Selecting the task and hitting F2.
- Using the name property in the properties tab.
In a Data Flow task you can right-click and select from a menu of choices. Most of them are straight-forward, and better served in a separate post. They are,
- Variables: Access the variables window.
- SSIS Toolbox: Access the toolbox.
- Getting Started: Access the getting started box.
- Add Annotation: This allows you to put a note in the designer. Essentially it’s a comment.
- Execute Task: Executes the current Data Flow task. Very handy for debugging and testing.
- Properties: Access the properties section. Also accessible with the F4 key.
You access Data Flow Tasks either by double-clicking it to open it or clicking on the drop-down box in the Data Flow window.
There is only a single exit arrow (green) from a Data Flow constraint. The precedence constraint options for a Data Flow task are:
- Evaluation Operation: It can be a constraint, expression, “expression and constraint” or “expression or constraint”.
- Value: Applies to constraint option. It can be success, failure or completion.
- Expression: Applies to the Expression option and brings up the expression editor.
- Multiple Constraints: Specifies how the constraints behave. The options are ALL (every constraint evaluates to true) and OR (at least one constraint evaluates to true).
There are a whole bunch of properties for the Data Flow task.
- BLOBTempStoragePath: Location where you store BLOB files. Use a semicolon for multiple locations.
- BufferTempStoragePath: Location for temporary storage of buffer data. Use a semicolon for multiple locations.
- DefaultBufferMaxRows: Sets the row buffer size in number of rows. Default is 10,000.
- DefaultBufferSize: Sets the buffer size. Defaults to 10MB.
- DelayValidation: Delays whether validation is delayed until run time. It’s useful when you know you design has an error that will prevent validation. For instance, a resource isn’t available.
- Description: Description of the package.
- Disable: Use to disable the package.
- DisableEventHandler: Disables event handlers for the package.
- EngineThreads: Number of threads the package can use. Default is 10. The minimum is 3. This is a recommendation, not an absolute. So it could, in fact, use more than 10 if the setting is 10.
- ExecValueVariable: Allows you to select a package variable that stores the execution result of the task.
- Expression: Can be used to assign an expression, and a value, to a property in the task.
- FailPackageOnFailure: Fails the entire package if the task fails.
- FailParentOnFailure: Fails the parent if the task fails.
- ForcedExecutionValue: Forced value that specifies the optional execution value of the package.
- ForceExecutionValueType: Type of variable for the execution value.
- ForceExecutionResult: The execution result of the package. It can be none, success, failure and completion.
- ForceExecutionValue: Specifies whether forced execution value should contain a value.
- HasExpressions: True if using expressions.
- ID: GUID of the object.
- IsolationLevel: Specifies the transaction isolation of the container.
- LocaleID: Defaults to the local of the operating system.
- LoggingMode: Can set to parent properties, disabled or enabled.
- MaximumErrorCount: Number of errors to fail the package. Default is 1.
- Name: Name of the package.
- PackagePath: Where the package is stored in the file system.
- RunInOptimizedMode: Removes unused columns, outputs and components from the data flow.
- TransactionOption: Enables the transaction level of the task.
That is a lot of property values to write and they’ll be repeated going forward so I’ll be typing this all again at some point. Fun!
And did I learn something doing this? Yep, the ForceExecution properties were new to me.