Monday, March 12, 2012

Row delimiter {LF} is ignored?

Hi!

Im trying to import some data from a Flat File Source, where a row delimiter is {LF} and column separator is SPACE.

Data looks like this:
GI$0c2 T08b 1 1 20060508 000655 6 8 8 c0a81f1f 1 1 1 00A 3 24206816 3 24206816 1 1 3 59910000 001 1 3 14730050 0 25 F10 XX 317148-131136 loop TG_MRB 0 M027 1 3 0 20060508 000655 0 3 59910000 SIP

This is the first record that generates around 41 columns and sorts data as it should, but if the second record is smaller the row delimiter {LF} is ignored and put into one of the columns untill all 41 columns from previous record are filled. It seems to me that columns separator has the priority over the row delimiter which is very wrong. :). If there is a {LF} in the file that should mean that it needs to be in a new row as a new record. I try to keep this all in a SQL 2005 package without using any additional software, i know there might be a solution with the scripting component, but would like to see if theres someone with the similar solution before i start writing any scripts. (i dont like parsing strings with scripts from bulky files:))

Thanks!

Sebastijan L.

Is the file delimited by spaces or is it a fixed-column-width format?

When you say the second record is smaller" what do you mean? Does it have some missing columns? If so, which columns?

There may be some fairly easy ways to parse this, but they depend very much on exactly how you expect records to vary from row to row.

Donald Farmer

|||Thank you for a quick response Donald!

Records are delimited by spaces as shown in the data sample above. For example:
If my 1st row contains 41 columns that are space delimited (row ends with {LF}), and the second row is smaller and contains only enough data for 39 columns, it will fill up the rest of the 2 columns with the data that is already a third record, and in that data i can see a row delimiter which is in my case{LF} from the second row!!

What it should do is whenever there is a specified row delimiter in data it should create another record, and in that case it doesnt..

I can import the data within SQL 2000, Access, Excell with no problem, just that im trying to keep everything inside SQL2005 as some features are pretty good, without the need of 3rd software.

Sebastijan L.
|||

Thanks.

SSIS does not know that columns 40 and 41 are missing - perhaps it was columns 1 and 2, or columns 8 and 13. And it does not know that you do not expect the characters used as row delimiters embedded in the middle of a record - there are cases where that can happen.

Other applications do make assumptions about row delimiters - some of them read ahead to the row end, take the whole row from delimiter to delimiter into memory and parse it out column by column from there. That is good for your scenario, but slow, and unreliable where delimiters can be embedded.

We can - and most likely will at some point - build some more "smarts" into the flat file source to handle some of these situations, although users do need to be aware that every new conditional property we add will decrease the performance of the source adapter.

Meanwhile, you can indeed handle this situation using a script or even expressions in the derived column component. The best solution for you will depend on how complex the error handling has to be, for example, if you know what columns are likely to be missing it makes it much easier.

Donald Farmer

No comments:

Post a Comment