Azure Blob Load Generator

Dark
Light

Azure Blob Load Generator

Dark
Light

Article Summary

Share feedback

Thanks for sharing your feedback!

Overview

The Azure Blob Load Generator component helps users load delimited data from public objects in an Azure Storage Blob.

The load wizard takes the form of a tool that empowers users to load and view files on the fly, altering load component properties and observing their effects without the need for a separate Transformation Job. The generator can also guess the schema of a table, relieving much of the end user's work.

This component requires working Azure credentials with "read" access to the Azure Blob containing the source data's file(s).

Important Information

Azure Data Lake Storage Gen2 uses a set of analytical capabilities that is built on Azure Blob Storage.
Azure Data Lake Storage Gen2 merges the capabilities of Azure Data Lake Storage Gen 1 with Azure Blob Storage. For more information, and to migrate Azure Data Lake Storage from Gen1 to Gen2, read, Introduction to Azure Data Lake Storage Gen 2.

The following section explains how to configure the Azure Blob Load Generator in Matillion ETL for Snowflake, Synapse and Delta Lake.

Configuring the Azure Blob Load Generator wizard

1. Create an Orchestration Job in the project.

2. In the Components tab, search for the Azure Blob Load Generator, and drag it on to the job canvas.

Please Note

The three-page Load Generator wizard will immediately open in the Get Sample page, after the component has been dropped onto the job canvas.

3. In the Azure Blob Storage field, use to select a file from an existing Azure storage location.

Please Note

Alternatively, you can manually type the Azure storage location URL into the Azure Blob Storage field, by using the following template:

azure://<account>/<container>/<path>

This file must be delimited (including .csv), and you must have permission to access the file.

4. In the Compression field use the drop-down menu provided. If the file is compressed, the correct compression method must be selected; supported compression methods are gzip and BZip2. These will differ depending on the chosen data warehouse.

5. In the Row Limit field, select the number of rows to be returned from the sample file, then click Get Sample.

Please Note

The wizard will load the intended file's data and attempt to guess the schema. The raw data will be displayed in the sample box, situated underneath the aforementioned fields.

6. Click Next to progress to the next page of the wizard.

7. In the Guess Schema page, the input file's data will be automatically displayed in the section at the top of the dialog. You may want to scroll through the list to view the entirety of the data.

8. Click Guess Schema and the Azure Blob Load Generator will attempt to guess as many of the file's properties as possible. All of the column data in the lower-right panel is available to edit. Click to add as many columns as you require. You will need to specify the Name, Type, Size, and Decimal Places values.

Please Note

To remove a column, click .

9. Schema Configuration properties are displayed in the lower-left panel. Modify the fields relevant to the file type. For more information, read Snowflake Properties, Synapse Properties, Delta Lake Properties, respectively. Click Next to progress to the final page of the wizard.

10. On the Validation page, the input file's data will be displayed in the panel at the top of the dialog. To view the resulting table's output, click Test, situated to the lower-left.

Please Note

Running a test will create the requested table on the relevant data warehouse cluster and show a sample for you to inspect.

11. To modify the output data, click Back to return to previous steps within the wizard, and make property and column alterations. Once modifications have been made, click Create & Run to complete the wizard.

The wizard will close, returning you to the job canvas, and create two linked components; Create/Replace Table and Azure Blob Storage Load. Each component is parameterized by the Azure Blob Load Generator, and can be run as a job by linking to a Start component.

Snowflake Properties
Property	Setting	Description
For more information on all the settings in this component, see the Snowflake COPY syntax for more information.
Table Name	String	A human-readable name for the table.
Field Delimiter	Character	Specify a delimiter to separate fields. The default is a comma [,]. A [TAB] character can be specified as "\ ".
Field Optionally Enclosed By	Character	A character that is used to enclose strings. Can be a single quote (') or double quote (") or NONE (default). Note that the character chosen can be escaped by that same character.
Ignore Header Rows	Integer	The number of rows at the top of the file to ignore. Default is 0.
Date Format	Date	Specify a date format. Defaults to auto.
Time Stamp Format	Timestamp	Specify the format of timestamp values in the data files to be loaded. Defaults to auto.
Escape Unenclosed	Character	Single character string used as the escape character for unenclosed field values only. Default is backslash (\\\\).
Escape	String	Single character used as the escape character for any field values.
Null As	String	This option replaces the specified string with NULL in the output table. Use this if your data has a particular representation of missing data.
Trim Blanks	Checkbox	Tick to remove trailing and leading white space from the input data.
Error on Column Count	Checkbox	If enabled, a parsing error is generated when the number of delimited columns (fields) in the input data file does not match the number of columns in the corresponding table.
Empty Field As Null	Checkbox	If ticked, empty columns in the input file will become NULL.

Synapse Properties
Property	Setting	Description
Table Name	String	A human-readable name for the table.
Field Delimiter	Character	Specify a delimiter to separate fields. The default is a comma [,]. A [TAB] character can be specified as "\ ".
Field Optionally Enclosed By	Character	A character that is used to enclose strings. Can be a single quote (') or double quote (") or NONE (default). Note that the character chosen can be escaped by that same character.
First Row	Integer	Typically the default will be 2. This is because row 1 may be reserved for column headers.
Date Format	Select	Specify a date format. Defaults to auto.
Max Errors	Integer	Specify the maximum number of rejected rows allowed in the load before the COPY operation is canceled. Each row that the COPY operation cannot import is ignored and counted as one error. The default value for this property is 0.

Delta Lake Properties
Property	Setting	Description
Table Name	String	A human-readable name for the table.
Field Delimiter	Character	Specify a delimiter to separate fields. The default is a comma [,]. A [TAB] character can be specified as "\ ".
Date Format	Character	Specify a date format. Defaults to auto.
Timestamp Format	Timestamp	Specify the format of timestamp values in the data files to be loaded. Defaults to auto.
Null As	String	This option replaces the specified string with NULL in the output table. Use this if your data has a particular representation of missing data.
Ignore Header Rows	Integer	The number of rows at the top of the file to ignore. Default is 0.

What's Next

Cloud Storage Load Generator (Snowflake)

Table of contents

Overview
Configuring the Azure Blob Load Generator wizard
Snowflake Properties
Synapse Properties
Delta Lake Properties