[This post is brought to you by Timesnapper. If it was not for TimeSnapper's screen and the OCR text recovery, I would have been able to retrieve this post from the demented and hungry maws of Word 2007. Word is as unstable and dangerous as it was 15 years ago, don't run it without a good screen recorder.]
When setting up a text file connector (that is a file layout for a fixed width file), you may want to default to using ragged right. Aside from any potential performance differences, there is an important refactoring and robustness issue.
If you choose “fixed width” instead of “ragged right” and realize that the file is actually ragged right (or it becomes ragged right), the switch will DELETE all your column descriptions.
This cavalier and arrogant style of programming is an echo of the data pipeline designer in DTS. In that designer, if the meta-data of the table changed too much, you would get the dread “Delete Transformations/Change Destination/Remap Transformations” choice. Since substantial amounts of handwritten code might be behind each transformation, this was a grossly destructive dialog box, something on the scale of “Oops, divide by zero detected, I shall now (literally) delete all your source code, wipe your hard drive (metaphorically) and send your bank account numbers to Nigeria!”
I like to think that applications have personality. Unfortunately, SSIS has a contemptuous sociopath personality.
Ragged right doesn’t work either, if you have a typical cobol source file, that is, one with a varying number of fields. So Ragged right might be able to deal with different null semantics in the last field, for example immediate CR/LF vs spaces + CR/LF, but it can’t deal with an immediate CR/LF to indicate a the next two columns are empty.
Normally to deal with a file with interleaved rows of different layouts, you import the file over and over, once per layout (header, rows of type 1, rows of type 2, etc). Each import discards row that don’t follow the layout. Unfortunately, the SSIS flat file can’t deal with rows that don’t follow the layout– instead it interprets the CR/LF as column data and tries to stuff the start of the row into a column. Sigh. MS-Access could deal with these kind of files.
This leaves importing all rows into a single column, then using a lot of SUBSTRING functions to parse the data. Here is an example of just that. If we have to continually fall back to writing code, why do graphical programming in the first place?
Or worse, preprocessing the text file with a .NET console application to conform the file to something SSIS can deal with.