Azure Blob Load Generator
    • Dark
      Light

    Azure Blob Load Generator

    • Dark
      Light

    Article Summary

    Overview

    The Azure Blob Load Generator component helps users load delimited data from public objects in an Azure Storage Blob.

    The load wizard takes the form of a tool that empowers users to load and view files on the fly, altering load component properties and observing their effects without the need for a separate Transformation Job. The generator can also guess the schema of a table, relieving much of the end user's work.

    This component requires working Azure credentials with "read" access to the Azure Blob containing the source data's file(s).

    Important Information

    • Azure Data Lake Storage Gen2 uses a set of analytical capabilities that is built on Azure Blob Storage.
    • Azure Data Lake Storage Gen2 merges the capabilities of Azure Data Lake Storage Gen 1 with Azure Blob Storage. For more information, and to migrate Azure Data Lake Storage from Gen1 to Gen2, read, Introduction to Azure Data Lake Storage Gen 2.

    The following section explains how to configure the Azure Blob Load Generator in Matillion ETL for Snowflake, Synapse and Delta Lake.


    Configuring the Azure Blob Load Generator wizard

    1. Create an Orchestration Job in the project.

    2. In the Components tab, search for the Azure Blob Load Generator, and drag it on to the job canvas.

    Please Note

    The three-page Load Generator wizard will immediately open in the Get Sample page, after the component has been dropped onto the job canvas.


    3. In the Azure Blob Storage field, use to select a file from an existing Azure storage location.

    Please Note

    Alternatively, you can manually type the Azure storage location URL into the Azure Blob Storage field, by using the following template:

    azure://<account>/<container>/<path>

    This file must be delimited (including .csv), and you must have permission to access the file.


    4. In the Compression field use the drop-down menu provided. If the file is compressed, the correct compression method must be selected; supported compression methods are gzip and BZip2. These will differ depending on the chosen data warehouse.

    5. In the Row Limit field, select the number of rows to be returned from the sample file, then click Get Sample.

    Please Note

    The wizard will load the intended file's data and attempt to guess the schema. The raw data will be displayed in the sample box, situated underneath the aforementioned fields.


    6. Click Next to progress to the next page of the wizard.

    7. In the Guess Schema page, the input file's data will be automatically displayed in the section at the top of the dialog. You may want to scroll through the list to view the entirety of the data.

    8. Click Guess Schema and the Azure Blob Load Generator will attempt to guess as many of the file's properties as possible. All of the column data in the lower-right panel is available to edit. Click to add as many columns as you require. You will need to specify the Name, Type, Size, and Decimal Places values.

    Please Note

    To remove a column, click .


    9. Schema Configuration properties are displayed in the lower-left panel. Modify the fields relevant to the file type. For more information, read Snowflake Properties, Synapse Properties, Delta Lake Properties, respectively. Click Next to progress to the final page of the wizard.

    10. On the Validation page, the input file's data will be displayed in the panel at the top of the dialog. To view the resulting table's output, click Test, situated to the lower-left.

    Please Note

    Running a test will create the requested table on the relevant data warehouse cluster and show a sample for you to inspect.


    11. To modify the output data, click Back to return to previous steps within the wizard, and make property and column alterations. Once modifications have been made, click Create & Run to complete the wizard.

    The wizard will close, returning you to the job canvas, and create two linked components; Create/Replace Table and Azure Blob Storage Load. Each component is parameterized by the Azure Blob Load Generator, and can be run as a job by linking to a Start component.


    Snowflake Properties

    PropertySettingDescription
    For more information on all the settings in this component, see the Snowflake COPY syntax for more information.
    Table NameStringA human-readable name for the table.
    Field DelimiterCharacterSpecify a delimiter to separate fields. The default is a comma [,].
    A [TAB] character can be specified as "\ ".
    Field Optionally Enclosed ByCharacterA character that is used to enclose strings. Can be a single quote (') or double quote (") or NONE (default). Note that the character chosen can be escaped by that same character.
    Ignore Header RowsIntegerThe number of rows at the top of the file to ignore. Default is 0.
    Date FormatDateSpecify a date format. Defaults to auto.
    Time Stamp FormatTimestampSpecify the format of timestamp values in the data files to be loaded. Defaults to auto.
    Escape UnenclosedCharacterSingle character string used as the escape character for unenclosed field values only. Default is backslash (\\\\).
    EscapeStringSingle character used as the escape character for any field values.
    Null AsStringThis option replaces the specified string with NULL in the output table. Use this if your data has a particular representation of missing data.
    Trim BlanksCheckboxTick to remove trailing and leading white space from the input data.
    Error on Column CountCheckboxIf enabled, a parsing error is generated when the number of delimited columns (fields) in the input data file does not match the number of columns in the corresponding table.
    Empty Field As NullCheckboxIf ticked, empty columns in the input file will become NULL.

    Synapse Properties

    PropertySettingDescription
    Table NameStringA human-readable name for the table.
    Field DelimiterCharacterSpecify a delimiter to separate fields. The default is a comma [,].
    A [TAB] character can be specified as "\ ".
    Field Optionally Enclosed ByCharacterA character that is used to enclose strings. Can be a single quote (') or double quote (") or NONE (default). Note that the character chosen can be escaped by that same character.
    First RowIntegerTypically the default will be 2. This is because row 1 may be reserved for column headers.
    Date FormatSelectSpecify a date format. Defaults to auto.
    Max ErrorsIntegerSpecify the maximum number of rejected rows allowed in the load before the COPY operation is canceled. Each row that the COPY operation cannot import is ignored and counted as one error. The default value for this property is 0.

    Delta Lake Properties

    PropertySettingDescription
    Table NameStringA human-readable name for the table.
    Field DelimiterCharacterSpecify a delimiter to separate fields. The default is a comma [,].
    A [TAB] character can be specified as "\ ".
    Date FormatCharacterSpecify a date format. Defaults to auto.
    Timestamp FormatTimestampSpecify the format of timestamp values in the data files to be loaded. Defaults to auto.
    Null AsStringThis option replaces the specified string with NULL in the output table. Use this if your data has a particular representation of missing data.
    Ignore Header RowsIntegerThe number of rows at the top of the file to ignore. Default is 0.