Pages

Thursday, June 12, 2014

Creating Excel Worksheets Dynamically in SSIS

Honestly when I read the question, I thought, "No, SSIS doesn't work that way.."  But I asked a few questions and started tinkering - following is my solution to the "problem" / question.

Package Variables -
Connection Managers -
The Control Flow -
Data Flow Task Components -
  • [Adding columns]
  • Click on the "Output Columns" folder. 
  • Now click on the "Add Column" button.  This will create a new column and you'll immediately be in edit mode on its name.  In our case, enter "Name" and press [Enter].  
  • Now in the right-hand panel, location and change the "DataType" value.  Set it to "Unicode string [DT_WSTR]".  
  • Set the length to 25 ( or up to 255 for Excel cells ).
  • Repeat starting at "[Adding Columns]" for "Job" and "Age" columns. 
' Microsoft SQL Server Integration Services Script Component
' Write scripts using Microsoft Visual Basic 2008.
' ScriptMain is the entry point class of the script.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
<CLSCompliant(False)> _
Public Class ScriptMain
Inherits UserComponent
Public Overrides Sub CreateNewOutputRows()
With Output0Buffer
.AddRow()
.Name = Variables.RowName.ToString
.Job = Variables.RowJob.ToString
.Age = Variables.RowAge.ToString
End With
End Sub
End Class
Test the package -
Let's start with variable declarations - we'll be using 6 in this example.  I'll start with the "fixed" variables and then go into the dynamic ones.  All except one of these will be of the String data type. 
Variables
rsResults is an Object which holds the result set of the stored procedure call which provides the rows for this task.
RowNameRowJob and RowAge are strings to hold the values that are coming back from the stored procedure one row at a time.  Yes, it's a "row by row" solution, but I'm assuming the number of rows will be low.
ExcelCreate is a variable which is evaluated as an expression which creates each of the Excel tabs as we loop through the rsResults rows.    its expression is the following:
"CREATE TABLE `" +  @[User::TabName] + "` ( `Name` NVARCHAR( 255 ) , `Job` NVARCHAR( 255 ) , `Age` NVARCHAR( 255 ) )"
TabName is a variable which is evaluated as an expression that will hold only the tab's name ( no formatting ).  We need this when we set up the Excel destination.   its expression looks like this:
@[User::RowName] + "s_Tab"

We'll also need 2 connection managers - one for the database from which we'll pull data and one for the destination Excel file.
Connection Managers
I named them ECM for Excel Connection Manager and MyDB for .. well, my DB.  
One Quick Note:  The Excel connection manager is tricky.  For this example, I had to create an Excel file in the format that I wanted as my final result - including column names and sample data in the destination folder.  Otherwise - I would not have been able to perform mapping operations.  However, when we're done - this file will not be required, nor will we need any "template" file to copy into the folder before we do the work.  This process will create the Excel file that we want from scratch.   The only cavaet that I found was that the "first row has column names" had to be checked.  I couldn't get this working without them for some reason.

Let's get building!   What we will put together in the control flow is pretty simple really.  The "tricks" are all in how it's configured and in our 2 variable expressions.  Here's a quick view of the package:
Pretty simple package layout
Yes, that's really "it" - but there is more to come under the proverbial covers..
Go ahead and drag in your top execute SQL task and set the connection to your DB connection manager.  You'll want to add your stored procedure call in the SQLStatement property as well.  Click down to the "Result Set" tab in the execute SQL task and click "add" to add a new result set line.   It should name it 0 ( zero ) automatically.  Under the drop-down for "Variable Name" select the rsResults variable that was created earlier.  Click OK to go back to the Control Flow.
Next drop in a Foreach Loop Container and connect the Exec SQL task to the Foreach loop container.  Open its editor and set the "Collection" settings as follows:
Foreach COLLECTION settings
Above, we configured the Enumerator to "step through" an ADO result set - in this case, the variable that holds the results from the Execute SQL task in the previous step.  We've set that variable as the "ADO object source variable".   Finally the enumeration mode tells the controls that rsResults has all of rows that we will step through in its "first table" - and in this case, the only table.
Now click the "Variable Mappings" tab.  You'll need to map the main 3 variables that you created to "catch" each row's "Name", "Job" and "Age" here.  When you're done, the mapping should look like this:
Foreach Variable Mappings
As the Foreach Loop Container loops through the rows stored in reResults, the values of the columns are assigned to the variables shown above.  This gives us a set of values to work with for each row as we "build" our Excel file.
With that done, drag another execute SQL task into the Foreach Loop Container and edit its properties.  For this one, you'll need to change the "Connection Type" setting to "EXCEL" and the "Connection" property will need to be set to the Excel connection manager that you created earlier.  You'll need to change the "SQLSourceType" to "Variable" and then set the "SourceVariable" value to the ExcelCreate variable that you built earlier.  When you're done, it should look like this:
Execute SQL Excel connection
Ok, for our last Control Flow component, drag a Data Flow Task into the Foreach Loop Container.   Connect the Execute SQL task to it with a precedence constraint.  Then open the data flow task.

IMPORTANT:  I mentioned toward the beggining that this was a little tricky.  Following is probably the oddest "trick" in this package.  When I first began to write this portion, I dropped a derived column task, tried to do conversion from CodePage 1252 ( SQL's default ) to UNICODE for each variable.   I thought this would make "Columns" and I could just pass the variables' values set in the foreach loop into an Excel destination.  I was NOT correct in that assumption!
So how do you get the values that are now set inside your variables by the foreach loop INTO the data flow?  Well, as I mentioned, this portion is a bit tricky.  We'll create a script component to use as a data source for this process.  its function will be to add a row in the data flow and populate its columns with the variable values.
Drop a Script Component into the data flow and select, "Source" as the component type.  Open the script component's editor and click into the "ReadOnlyVariables" field.  An ellipse button ( "..." ) should appear.  Click it and you should be presented with a window to allow variable selection.   Locate and "check" the variables for "RowAge", "RowJob" and "RowName".  Click OK to return to the editor.  Next, set the "ScriptLanguage" to "Microsoft Visual Basic 2008".  When you're done, the "Script" setting should look as follows:
Script component settings
Click the "Inputs and Outputs" tab of the script component.  Expand "Output 0".  Normally in SSIS, your outputs are automatically added in the dataflow as you connect components.  After all, if you drop a source - SSIS can determine the available columns and sometimes you have to select the columns to be used at the source.  We will need to do thatmanually since we're creating our own source. 
* Just a few quick notes - I set my "Age" value to UNICODE with a width of 3.  Also the reason that I start with "Click on the "output columns" folder before clicking "Add Column" is to ensure that the new added columns end up at the end of the list.  If you have a different column selected, clicking "Add" seems to "insert" the new column rather than adding it at the end of the list.
If all went well, your "Inputs and Outputs" should now look similar to the following:
Script component outputs
Click back on the "Script" tab and click on the "Edit Script..." button.  This should open up the VB Script editor.  A template of code will automatically be added for you.  However in this example, the only procedure that we are interested in is the "CreateNewOutputRows" procedure.  ( You can delete the others if desired ). 
For this example, I'm using the "default" output buffer named "Output0Buffer".  So I created a "with" block and within it, add a single row and then set the row's values to the variable values.  To save you a little typing, all of the code is included below.  Note that I have deleted the portions of the template which we will not use to save space.
That's "all" we need to do with this component.  We've manually created our outputs to "catch" our variable values.  We've added a row in the output buffer in which to set the values and finally we've assigned the "columns" in the output buffer to the values in our variables as set in the foreach loop container.
The Excel Destination -
We're almost to the proverbial finish line!   The last piece of this is to add an Excel Destination and connect everything up and map it.  Easy, right!?  Well - not exactly.  Excel is extraordinarily picky.  Surprised
So, BEFORE we work with the Excel destination - we have to work with the Excel file.  Yes, we need the file to already exist to create the file this way.  So we're going to cheat a little bit and create one to work on.
Go back to the control flow and right click the data flow within the foreach loop container.   Select "Disable" ( on the entire data flow ) and run the package.
Another quick note:  If you're running 64-bit SQL Server and 32-bit Office, you'll need to go to the "Project" menu in the BIDS IDE and select the bottom option on the menu which should be, "{your project name} properties...".   In the properties window's left-hand pane, select "Debugging".  In the resulting right-hand pane, set the "Run64BitRuntime" value to "False".
If all went well, your should see something like this:
Building initial Excel Sheet
You should also have a new sheet created wherever your Excel Connection manager was set to point.  Check that directory to ensure you have a file.  It should have headers ( no data yet ) and one sheet per row returned by the "SQL_Get_Source_Rows" stored procedure call.
If you have that file now ( feel free to open it in Excel and check it out - but be sure to exit Excel before continuing ), enable the data flow and go back to the excel destination on the data flow tab of the package.
On the "Connection Manger" tab of the Excel Destination, the OLE DB connection manager will probably already be set to the excel connection manager.  So change the "Data access mode" setting to "Table name or view name variable".  Below that, set the Variable name to our "User::TabName" variable as seen below.
Excel Destination - Connection Manager Settings
Click on the "Mappings" tab.  If you named the script component outputs to be the same as the names you created in the Excel sheet, the mappings will probably be done for you automatically.  If they are not, manually map the source columns to the destinations in the Excel spreadsheet.
Excel Mappings
Click "OK" to return to the Data Flow.
Right-click the Excel Destination and select "properties".  Locate the property, "ValidateExternalMetadata" and set it to "False".   We don't want it to validate the file - because normally it will not be there, until the first part of the package runs and it gets created.
Save the SSIS package.

Go to your Excel Connection Manager's folder and remove the output excel file that we just used to set up the Excel Destination.
It's the moment of truth - run the package. 
If all went well, the components should all be "green" now.
Go to your excel folder and open the resulting XLS spreadsheet.
If all went well, your results should look like the following:
Test results
As you can see, the data is in the sheet for "jon" and "anna" has a tab with that user's data as well.

1 comment:

  1. In case you are interested in earning money from your visitors by popunder ads - you should run with one of the most reputable networks - PopAds.

    ReplyDelete