Create External Table (Snowflake)
Create External Table
This component enables users to create an "external" table that references externally stored data. Note that this creates a table that references the data that is held externally, meaning the table itself does not hold the data. External Tables can be queried but are read-only. For in-depth information, please consult the Snowflake External Tables documentation.
Referencing externally held data can be valuable when wanting to query large datasets without resorting to storing that same volume of data in Snowflake.
Data loaded in these tables takes the form of a single column ("VALUE") with a variant-type value with multiple properties. Each column of target data becomes a property within the variant on its respective row. Thus, the data will usually require some transformation to be in a format that is usually desired. See also: Flatten Variant Component.
Note: External Tables MUST be refreshed before use whenever the externally held data is changed using the Refresh External Table Component. We recommend always refreshing an External Table before using it in a job.
Note 2: External Tables require an External Stage to stage data. External Stages can be managed via the Manage Stages menu.
|Name||Text||The descriptive name for the component.|
|Create/Replace||Select||Create: Create the new table with the given name. Will fail if a table of that name already exists.
Create if not exists: Will create the new table with the given name unless one already exists. Will succeed and continue in either case.
Replace: Will create the new table, potentially overwriting any existing table of the same name.
|Database||Select||Select a Snowflake Database to house the external table.|
|Schema||Select||Select the table schema. The special value [Environment Default] will use the schema defined in the environment. For more information on using multiple schemas, see this article.|
|New Table Name||Text||The name of the External Table to be created or used.|
|Partition Columns||Multiple Expression||A partition is defined by an expression applied to a set of data, resulting in a partition that only contains rows that satisfy the expression. For example, partitioning a data set by the value in the 'year' column.
For more information in setting up Partitions with Snowflake's External Tables, please see the Snowflake documentation..
Name: The name of the new Partition Column.
Type: The data type of the Partition Column.
Size: The data size of the new column values.
Precision: The precision of the new column values.
Expression: The expression used to partition the data.
|Stage Database||Select||Select a Snowflake Database for the External Stage.|
|Stage Schema||Select||Select a Snowflake Schema for the External Stage.|
|Stage||Select||Select an External Stage for the data. Staging areas can be managed via the Manage Stages menu or created through Snowflake using the CREATE STAGE command.|
|Relative Path||Text||The directory path to follow to the target data. File names cannot be specified. The path is relative to the storage location given in the External Stage setup.|
|Format||Select||Select a premade file format that will automatically set many of the component properties accordingly. These formats can be created through the Create File Format component.|
|File Type||Select||The type of expected data to load. Some may require additional formatting, explained in the Snowflake Documentation.
Available options are: AVRO, CSV, JSON, ORC, PARQUET, and XML.
Component properties will change to reflect the choice made here and give options based on the specific file type.
|Enable Octal||Select||Enables parsing of octal numbers.|
|Compression||Select||Whether the input file is compressed in GZIP format, BROTLI, BZ2, DEFLATE, RAW_DEFLATE, ZSTD, or not compressed at all.|
|Record Delimiter||Text||The delimiter to be used that separates records (rows) in the file. Defaults to newline. \n can also signify a newline. \r can signify a carriage return.|
|Field Delimiter||Text||The delimiter that separates columns. The default is a Comma. A [TAB] character can be specified as "\t".|
|Skip Header||Text||The number of rows at the top of the file to ignore - defaults to 0.|
|Escape||Select||When this option is specified, the backslash character (\) in input data is treated as an escape character.|
|Escape Unenclosed Field||Text||Single character string used as the escape character for unenclosed field values. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). If a character is specified in the 'Escape' field, it will override this field.|
|Trim Space||Select||Removes trailing and leading whitespace from the input data.|
|Field Optionally Enclosed||Text||A character that is used to enclose strings. Can be single quote (') or double quote (") or NONE (default). Note that the character chosen can be escaped by that same character.|
|Null If||Text||This option replaces the specified string with null in the output table. Use this if your data has a particular representation of missing data.|
|Error On Column Count Mismatch||Select||Generate an error if the number of delimited columns in the input does not match that of the table. If false, extra columns are not loaded into the table and missing columns are recorded as NULL in the table.|
|Empty Field As Null||Select||If this is set, empty columns in the input file will become NULL.|
|Encoding Type||Select||The type of encoding that has been applied to the data. This defaults to UTF-8.|
|Strip Outer Array||Select||Enable removal of outer square brackets from JSONs.|
|Strip Null Values||Select||Enables removal of null values.|
|Ignore UTF8 Errors||Select||When true, replaces invalid UTF-8 sequences with the unicode replacement character (U+FFFD), instead of throwing an error.|
|Preserve Space||Select||When parsing XML files, leading and trailing spaces in elements are preserved if set to true.|
|Strip Outer Element||Select||When parsing XML files, this will strip the outermost XML element, exposing the 2nd level elements as separate documents.|
|Disable Snowflake Data||Select||When parsing XML files, will disable recognition of Snowflake semi-structured data tags if set to true.|
|Disable Auto Convert||Select||When parsing XML files, enables conversion of numeric and Boolean values from text to their native types if set to true.|
In this example we use External Tables to load CSV data from S3 and transform it into a more useful form before creating a permanent table from the result. To accomplish this, we use an Orchestration and a Transformation job, shown below, respectively.
First, we have made an External Stage and pointed it to an S3 bucket that contains our CSV data.
Next, we have configured the Create External Table component to use the "csv" directory at that S3 Bucket URL. Be warned, this will pull in all CSV files from that directory. Since the output are variants, the CSV format does not matter. However, it is usually most useful to ensure all CSV files present are of the same format and that they are wanted in the data load. We have instructed the component to create an External Table named "Doc_Ext".
After this, the External Table is refreshed using the Refresh External Table Component before a new, regular table is created with the expected columns set up. This job then links into the Transformation job shown previously.
The data can be sampled using a Table Input component pointed at "Doc_Ext", our new external table (providing the Orchestration job has run at least once). The sampled data is shown below.
As can be seen, this data has loaded correctly but is not in the most useful format. External Data is loaded as a variant of the target data. We can unpack this into a more useful form using the Flatten Variant Component, the column mapping of which is shown below.
Note that there is only one source column, "VALUE". We identify "col1" and "col2" as Properties within that column and give them an alias describing their true purposes. Their values will then be extracted and placed into columns named using those aliases, the final result of which is shown below.
Linking a Table Output Component to the end of the workflow allows us to now save this transformed data in a permanent, regular table that we created earlier. Meanwhile, our External Table remains intact, ready to be used again should the external data be updated.