Tuesday, March 27, 2012

adding a new data column (not derived) midway thru a data flow

I need to know what a table's max row Identity is part way thru a data flow. I can't get it at the beginning of the data flow. I need to either (1) add it to the data buffer part way thru or (2) set it into a package variable and then reference the var in a script component.

I've not found a way to add a database column to the data buffer without doing a lookup for each row (too slow and not appropriate here) or some goofy oledb source and then merge join into the data buffer on a contrived join.

I've read questions about referencing package vars in scripts but I can't get that to work. DTS.Variables("varname").Value isn't recognised when I code it up.

Anyone have an idea or solution for either one of these? If you're gonna explain the script code, please include the entire snipet including the INCLUDEs, etc.

While I am not convinced that the MergeJoin is contrived or the oledb source is goofy you should check to see that the variable varname is in the ReadOnlyVariables property of the script component. If it isn't then add it and then the DTS.Variables("varname").Value should work. If it is then can you provide the error you are receiving.

Thanks,

Matt

|||

Matt,

I wasn't saying that the MergeJoin is contrived or that the OLEDB source is goofy, I was saying that the way I'd be using them to solve my dilema is. I have used both of these controls and am happy with them, but they aren't what I'd like to use to solve this problem.

Regarding the scripting problem, I have put varname in the ReadOnlyVariables property. In the script window when I enter DTS.Variables("varname").Value the interpreter places the squigly underline under DTS and says that "Name 'DTS' is not declared". There is no intelisense when I type the period between DTS & Variables. So I'm thinking that something needs to be declared. Here's my code:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain
Inherits UserComponent
Dim counter As Integer
Dim FirstTime As Boolean

Public Sub New() 'This method gets called only once per execution
FirstTime = True
End Sub

Public Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer)
If FirstTime Then
counter = Row.MaxRowId
FirstTime = False
End If
counter = dts.variables("dataset").value 'dts not recognized
Row.SubCntrctId = counter
End Sub
End Class

|||

Gordon,

The syntax for accessing variables in the script component is different from in the script task. The syntax is:

Me.Variables.<var-name>|||Somebody would want to tell the boys in Wrox about this... their

chapter on scripting is very light on the differences between Task and

Component...

They only mention variables in relation to the script task... so you

(ok, well.. I, at least . ) end up supposing that, since no one

mentioned otherwise, the variable access must be the same in a

script component... sigh...

But, thankfully, we have good ole Jamie to set us straight...

Thanks Jamie,

PJ|||

Gordon, I know what you mean -- I'm facing the same problem in SSIS Designer. Usually I work around it in the data source, doing an explicit join right off the bat to get the extra column with the needed value. But now I have a case where the data source is a flat file but the value needed for the extra column is still determined by a SQL select statement (and still independent of the rows in the primary data source).

So I've got a multi-row, multi-column data stream X and a single-row, single-column data stream Y. There should be a simple way in SSIS Designer to (outer) join them (i.e. with no condition):

select X.x1, X.x2, Y.y1 from X, Y

But all the Merge/Join/Lookup etc. data flow transformations require (as you said) join columns to be identified.

This could be done with the Derived Column transformation, except that its expression cannot be a SQL statement and it doesn't need to be evaluated it for every row of X anyway. Which leads me to a similar conclusion as you: that since a variable could be used in the expression, the result of the query should be saved in a variable. But in SSIS Designer, a variable (in this case, scoped to the data flow task) cannot be specified as an SQL statement.

Help! What is the best practice in this situation?

|||

Kevin,

Your request is basically to be able to do an OUTER JOIN. Is that correct?

I have lamented the lack of an OUTER JOIN component in the past and have requested one for future versions. Unfortunately I did that during beta and that request hasn't made it through to Microsoft Connect. Can I suggest that you go to Microsoft Connect and put in a request for it and then put a link to it up here so that I can vote for it.

If you dont ask then you won't get!

The only way to accomplish what you need in ths instance is to use a script component.

-Jamie

|||

Hello all

There are two ways I could find to tackle this problem.

1) This is very slow and consumes a lot of memory

create a dummy derived column in both tables you want to join and insert a dummy 1(you may need to sort on this column which is the real performance killer). Then use merge join on this field so resulting new table contains the new column coming from the other rowset.

2) Split the DataFlow task into several pieces. Populate a variable (I am assuming the value you want to add as the new column will come from a rowset of count 1) with the first pass. Then use a derived column to insert the new value using this variable.

I am not sure if these are the recommended ways to do it but they worked for me.

Batu

sql

No comments:

Post a Comment