Raw File Reader Ssis

-->

APPLIES TO: SQL Server, including on Linux Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse

The Raw File destination writes raw data to a file. Because the format of the data is native to the destination, the data requires no translation and little parsing. This means that the Raw File destination can write data more quickly than other destinations such as the Flat File and the OLE DB destinations.

In addition to writing raw data to a file, you can also use the Raw File destination to generate an empty raw file that contains only the columns (metadata-only file), without having to run the package. You use the Raw File source to retrieve raw data that was previously written by the destination. You can also point the Raw File source to the metadata-only file.

The raw file format contains sort information. The Raw File Destination saves all the sort information including the comparison flags for string columns. The Raw File source reads and honors the sort information. You do have the option of configuring the Raw File Source to ignore the sort flags in the file, using the Advanced Editor. For more information about comparison flags, see Comparing String Data.

You can configure the Raw File destination in the following ways:

Recently I had an interesting problem in Microsoft SQL Server Integration Services (SSIS). I had to append data to a raw file in a for each loop. A raw file destination component can be configured to have the following behavior: Append, Create always, Create once, Truncate and append.

Specify an access mode which is either the name of the file or a variable that contains the name of the file to which the Raw File destination writes.
Indicate whether the Raw File destination appends data to an existing file that has the same name or creates a new file.

The Raw File destination is frequently used to write intermediary results of partly processed data between package executions. Storing raw data means that the data can be read quickly by a Raw File source and then further transformed before it is loaded into its final destination. For example, a package might run several times, and each time write raw data to files. Later, a different package can use the Raw File source to read from each file, use a Union All transformation to merge the data into one data set, and then apply additional transformations that summarize the data before loading the data into its final destination such as a SQL Server table.

Note

The Raw File destination supports null data but not binary large object (BLOB) data.

Note

The Raw File destination does not use a connection manager.

This source has one regular input. It does not support an error output.

Append and New File Options

Ssis Raw File Viewer

The WriteOption property includes options to append data to an existing file or create a new file.

The following table describes the available options for the WriteOption property.

Option	Description
Append	Appends data to an existing file. The metadata of the appended data must match the file format.
Create always	Always creates a new file.
Create once	Creates a new file. If the file exists, the component fails.
Truncate and append	Truncates an existing file and then writes the data to the file. The metadata of the appended data must match the file format.

The following are important items about appending data:

Appending data to an existing raw file does not re-sort the data.
You need to make certain that the sorted keys remain in the correct order.
Appending data to an existing raw file does not change the file metadata (sort information).

For example, a package reads data sorted on the ProductKey (PK). The package data flow appends the data to an existing raw file. The first time the package runs, three rows are received (PK 1000, 1100, 1200). The raw file now contains the following data.

1000, productA
1100, productB
1200, productC

The second time the package runs, two new rows are received (PK 1001, 1300). The raw file now contains the following data.

1000, productA
1100, productB
1200, productC
1001, productD
1300, productE

The new data is appended to the end of the raw file, and the sorted keys (PK) are out of order. In addition, the append operation didn't change the file metadata (sort information). If you read the file by using the Raw File source, the component indicates that the file is still sorted on PK even though the data in the file is no longer in the correct order.

To keep the sorted keys in the correct order while appending data, you can design the package data flow as follows:

Retrieve new rows by using Source A.
Retrieve existing rows from RawFile1 using Source B.
Combine the inputs from Source A and Source B by using the Union All transformation.
Sort on PK.
Write to RawFile2 by using the Raw File destination.
RawFile1 is locked because it's being read from, in the data flow.
Replace RawFile1 with RawFile2.

Using the Raw File Destination in a Loop

If the data flow that uses the Raw File destination is in a loop, you may want to create the file once and then append data to the file when the loop repeats. To append data to the file, the data that is appended must match the format of the existing file.

To create the file in the first iteration of the loop, and then append rows in the subsequent iterations of the loop, you need to do the following at design time:

Set the WriteOption property to CreateOnce or CreateAlwaysand run one iteration of the loop. The file is created. This ensures that the metadata of appended data and the file matches.
Reset the WriteOption property to Append and set the ValidateExternalMetadata property to False.

If you use the TruncateAppend option instead of the Append option, it will truncate rows that were added in any previous iteration, and then append new rows. Using the TruncateAppend option also requires that the data matches the file format.

Configuration of the Raw File Destination

You can set properties through SSIS Designer or programmatically.

The Advanced Editor dialog box reflects the properties that can be set programmatically. For more information about the properties that you can set in the Advanced Editor dialog box or programmatically, click one of the following topics:

Related Tasks

For information about how to set properties of the component, see Set the Properties of a Data Flow Component.

Raw File Destination Editor (Connection Manager Page)

Use the Raw File Destination Editor to configure the Raw File destination to write raw data to a file.

What do you want to do?

Open the Raw File Destination Editor

Add the Raw File destination to an Integration Services package, in SQL Server Data Tools (SSDT).
Right-click the component and then click Edit.

Set options on the Connection Manager tab

Access mode
Select how the file name is specified. Select File name to enter the file name and path directly, of File name from variable to specify a variable that contains the file name.

File name or Variable name
Enter the name and path of the raw file, or select the variable that contains the file name.

Write option
Select the method used to create and write to the file.

Generate initial raw file
Click the button to generate an empty raw file that contains only the columns (metadata-only file), without having to run the package. The file contains the columns that you selected on the Columns page of the Raw File Destination Editor. You can point the Raw File source to this metadata-only file.

When you click Generate initial raw file, a message box appears. Click OK to proceed with creating the file. Click Cancel to select a different list of columns on the Columns page.

Set options on the Columns tab

Available Input Columns
Select one or more input columns to write to the raw file.

Input Column
An input column is automatically added to this table when you select it under Available Input Columns, or you can select the input column directly in this table.

Output Alias
Specify an alternate name to use for the output column.

Raw File Destination Editor (Columns Page)

Use the Raw File Destination Editor to configure the Raw File destination to write raw data to a file.

What do you want to do?

Open the Raw File Destination Editor

Add the Raw File destination to an Integration Services package, in SQL Server Data Tools (SSDT).
Right-click the component and then click Edit.

Set options on the Connection Manager tab

Access mode
Select how the file name is specified. Select File name to enter the file name and path directly, of File name from variable to specify a variable that contains the file name.

File name or Variable name
Enter the name and path of the raw file, or select the variable that contains the file name.

Write option
Select the method used to create and write to the file.

Generate initial raw file
Click the button to generate an empty raw file that contains only the columns (metadata-only file), without having to run the package. You can point the Raw File source to the metadata-only file.

When you click the button, a list of the columns appears. You can click cancel and modify the columns or click OK to proceed with creating the file.

Set options on the Columns tab

Available Input Columns
Select one or more input columns to write to the raw file.

Input Column
An input column is automatically added to this table when you select it under Available Input Columns, or you can select the input column directly in this table.

Output Alias
Specify an alternate name to use for the output column.

Why raw files?

One of the big benefits of working with raw files is the ease with which you can add them to your packages. The raw file destination requires you to specify only the name of file to use and the columns you want to store. Conversely, if you use a text file destination, then you have to build the structure of the file and ensure it makes your data flow. If you use an OLEDB destination, then you have to have a database and also a table created to store the data.

Their ease of configuration makes raw files ideal for storing errors in your data flows. In the past I’ve been guilty of configuring certain components of my packages to ignore failures, on the assumption that it “can’t error”. Of course, this isn’t ideal and a much better approach is to capture all errors and handle them appropriately. This is where raw files are very useful. All you need to do is direct your error flows to a raw file destination, specify and filename and select the columns you want to store. If errors do occur, then they will be stored in the raw files.

So, you’re storing any errors in a raw file, and now you want to review them – but you can’t because of the proprietary format used. This is the exactly the srot fo scenario for which my raw file reader is designed.

Using the Raw File Reader

The Raw File Reader is designed to make it very simple to read the contents or raw files. A few of the key features are as follows ( a full list can be seen at the above link):

Read any SSIS Raw file
Results displayed in a grid
Columns automatically resized
Order of columns can be changed
Data can be copied from the results grid using CTRL+C
Column headers can be included or excluded from the copied data
Notification (and option to reload) when the currently loaded raw file has changed i.e. package is rerun

Raw File Reader For Windows 7

Opening a raw file

The simplest operation is to open a raw file (File | Open). This process will, by default, read the raw file configuration and then immediately read the data from the raw file. The configuration is displayed in a list at the top of the form and the data is displayed in a grid at the bottom of the form (you can change this default behaviour so that the data is not automatically loaded – see later).

Once the data from your raw file has been loaded you can sort the data and copy it into another application. Once a raw file has successfully been read, the filename will appear in the Recent Files section of the File menu. The last 10 files opened will be displayed here.

Sorting and copying data

Data on the grid can be sorted and copied using standard “excel-style” functionality. To sort data, simply click on the column heading of the column you wish to sort. Clicking on a column heading for the second time will reverse the order of the sorting.

You can select a cell from the grid (or multiple cells, by dragging the cursor and/or using the CTRL key) and copy to the clipboard using CRL+C, from where they can be pasted into other applications. When the data is copied to the clipboard the area copied will be the maximum area required to include all the selected cells, however cells not selected will be blank. In the example below, we copy only the object name and the object type:

When the copied data is pasted into Excel you can see that, by default, column headings are included but the data in the unselected cells are not included (you can change this default behaviour – see the Options section, later).

Opening the raw files associated with a package

Opening every raw file individually may be time consuming. Therefore, the tool provides the option of opening every raw file associated with a package. Simply, open the package configuration, identify the raw file destinations and open the files that are specified for these destinations. At present the filename for the raw file destinations must be configured directly, not via expressions or variables.

When you open the package, all of the raw files for the package will be opened, and can be organized using the options in the Window menu. As with opening raw files, recently opened packages are included in the Recent Projects list on the File menu.

Working with raw files whilst debugging packages

The Raw File Reader will automatically reload a raw file when it detects the file has changed. This is a configurable option and so can be turned off if required. If you turn this option off you can manually reload the file by using the Read File command in the Tools menu.

Reading large files

Raw files are often large. If you are using large files I recommend that you disable the option that enables the raw file data to be loaded as soon as the file is opened (see the Options section). Once the configuration has been read, you choose to load only the required columns, by checking and un-checking the columns in the Raw file columns pane at the top of the window. This can drastically reduce the time it takes to load a file. If you want to select/de-select more than one column at a time, select them using the SHIFT or CTRL keys and then click on one of the check boxes. All the selected items will then be changed to the state of the checkbox selected.

Raw File Reader

Once you have selected the columns you want, the raw file can be read by using the Read File command in the Tools menu.

Options

The configurable options can be accessed by the Options menu in the Tools menu. The copy options configures when headers are included with copied cells.

Option	Description
Always include headers	The column headers, for the cells copied, are always included, irrespective of the number or distribution of cells selected.
Never include headers	Headers are never included
Auto include headers	Column headers are only included if the cells selected are in more than one column. This enables you to copy data from a single column without column headers being added.

Other options are as follows:

Option	Description
Reload file when it changes	This enables the application to reload the data for a raw file automatically when the application detects the raw file has changed
Load file when opened	If selected the data from a raw file is read immediately that a file is opened. If deselected then only the raw file configuration is read when the file is opened.
Raw file extensions	There is no standard naming for raw file extensions, this allows you to configure the extension that the Open File dialog uses to determine a raw file. Multiple extensions can be specified by separating them by semi colons “;”

Future enhancements

I am looking to develop the application further and would appreciate comments on what features you would like to see in future versions. Some of those currently on my list are

Changing the raw file columns pane to display the meta data for each column i.e. data types, sizes etc, this will allow sorting of columns
Row numbers
Extra column headers details
Paging rows
Monitoring a folder for raw files

Rawomg

Raw File Reader Ssis

Append and New File Options

Ssis Raw File Viewer

Using the Raw File Destination in a Loop

Configuration of the Raw File Destination

Related Tasks

Related Content

Raw File Destination Editor (Connection Manager Page)

Open the Raw File Destination Editor

Set options on the Connection Manager tab

Set options on the Columns tab

Raw File Destination Editor (Columns Page)

Open the Raw File Destination Editor

Set options on the Connection Manager tab

Set options on the Columns tab

See Also

Why raw files?

Using the Raw File Reader

Raw File Reader For Windows 7

Opening a raw file

Sorting and copying data

Opening the raw files associated with a package

Working with raw files whilst debugging packages

Reading large files

Raw File Reader

Future enhancements