Create File Format Component
Create File Format Component
Creates a named file format that can be used for bulk loading data into (and unloading data out of) Snowflake tables. This format can then be used in other components to simplify the component's options and use the custom file format.
|Name||Text||The descriptive name for the component.|
|Create/Replace||Select||Create - the default option, creates a new table. This will
generate an error if a table with the same name already
exists, but will never destroy existing data.
Create if not exists - This will only create a new format if one of the same name does not already exist.
Replace - Creates a new format and replaces any format of the same name, overwriting it.
|File Format Name||Text||The name of the file format being created.|
|Compression||Select||Whether the input file is compressed in GZIP format, BROTLI, BZ2, DEFLATE, RAW_DEFLATE, ZSTD or not compressed at all.|
|Record Delimiter||Text||The delimiter to be used that separates records (rows) in the file. Defaults to newline. \n can also signify a newline. \r can signify a carriage return.|
|Field Delimiter||Text||The delimiter to be used that separates fields (columns) in the file.|
|Skip Header||Text||The number of rows at the top of the file to ignore - defaults to 0.|
|Date Format||Text||Defaults to 'auto' - this can be used to manually specify a date format.|
|Time Format||Text||Defaults to 'auto' - this can be used to manually specify a time format.|
|Timestamp Format||Text||Defaults to 'auto' - this can be used to manually specify a time format.|
|Binary Format||Select||Choose between the expected format of binary data: BASE64, HEX, UTF-8.|
|Escape||Text||Single character string used as the escape character for any field values. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default).|
|Escape Unenclosed Field||Text||Single character string used as the escape character for unenclosed field values. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). If a character is specified in the 'Escape' field, it will override this field.|
|Trim Space||Select||Removes trailing and leading whitespace from the input data (TRUE or FALSE).|
|Field Optionally Enclosed||Text||A character that is used to enclose strings. Can be single quote (') or double quote (") or NONE (default). Note that the character chosen can be escaped by that same character.|
|Null If||Multiple Text||Enter one or more strings that will be replaced with NULL if found in the source. Only applies to columns that are nullable.|
|Error On Column Count Mismatch||Select||Generate an error if the number of delimited columns in the input does not match that of the table. If false, extra columns are not loaded into the table and missing columns are recorded as NULL in the table.|
|Empty Field as Null||Select||If TRUE, cast empty fields as NULL. If FALSE, cast empty fields as the corresponding column type with blank data.|
In this example, S3 Load is used to take a file and load the data into a table. However, a defined format must exist for the data to be transcribed into rows and columns. We use the Create File Format component to create a new format that can be used in S3 Load. Below shows the canvas of a typical job of this type:
To explain briefly; S3 Put Object takes a file from an HTTP server and puts it into an S3 Bucket while Create Table makes a table for the data to be loaded into. After this, Create File Format is used to create a predefined format that S3 Load can use. Below a snippet of the raw data is shown.
From this data we learn that there is a header row we must ignore, which also provides us with column names. Note that these columns must be defined in the Create Table component and then selected in the S3 Load component. The data is field delimited using commas and record delimited only by a carriage return. We choose not to define any time or date formats since our file does not contain any such data. There is also no need to define escape characters.
These realizations are reflected in our choices for parameterization of the Create File Format component, shown below:
We have named the new file format 'CUSTOMEXAMPLE' and so this is the format we choose in the S3 Load Component (shown below). Note that when selecting a format, many properties in S3 Load will disappear, since the predefined format overrides those properties.
This Orchestration job is now ready to be run. A simple Transformation job with a Table Input component can then be used to sample the resulting table, a small snippet is shown below:
We can also see that this file format has been created and stored in our Snowflake account. To view file formats, visit your Snowflake account and browse to Databases → <DatabaseName> → File Formats.