Intersect
    • Dark
      Light

    Intersect

    • Dark
      Light

    Article Summary

    This article is specific to the following platforms - Snowflake - Redshift - BigQuery - Synapse.

    Intersect Component

    The Intersect component compares two input flows, and then outputs any rows that are in common (identical) in both tables.

    An entire row is compared, but only overlapping columns are considered, otherwise no rows would look unique. Also, because this component uses the SQL INTERSECT operator, only unique rows are passed to the next component if there are duplicate rows in the source components.


    Properties

    Snowflake Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    Cast TypesSelectSelect whether to cast types.
    Yes: If the same-named column from both inputs have differing input types, attempt to cast them to a common type. This is not guaranteed to work, so users should check their data carefully.
    No: If the same-named column from both inputs have differing input types, report an error and does not continue.

    Redshift Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    Cast TypesSelectSelect whether to cast types.
    Yes: If the same-named column from both inputs have differing input types, attempt to cast them to a common type. This is not guaranteed to work, so users should check their data carefully.
    No: If the same-named column from both inputs have differing input types, report an error and does not continue.

    BigQuery Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    Cast TypesSelectSelect whether to cast types.
    Yes: If the same-named column from both inputs have differing input types, attempt to cast them to a common type. This is not guaranteed to work, so users should check their data carefully.
    No: If the same-named column from both inputs have differing input types, report an error and does not continue.
    Note: Care should be taken when casting float types into string types as this may introduce numerical inaccuracies.

    Synapse Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    Cast TypesSelectSelect whether to cast types.
    Yes: If the same-named column from both inputs have differing input types, attempt to cast them to a common type. This is not guaranteed to work, so users should check their data carefully.
    No: If the same-named column from both inputs have differing input types, report an error and does not continue.

    Delta Lake Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    Cast TypesSelectSelect whether to cast types.
    Yes: If the same-named column from both inputs have differing input types, Matillion ETL attempts to cast them to a common type. This is not guaranteed to work, so users should check their data carefully.
    No: If the same-named column from both inputs have differing input types, Matillion ETL reports an error and will not continue.

    Strategy

    Generates multiple select queries separated by INTERSECT.

    Example

    This example includes two tables of user data. One table is an updated version of the other. When viewing the table with updated data, it is apparent that some users have changed their details since the details were last recorded. For this example, we wish to find any data relating to users who have not changed their details. To do this, we will use the Intersect component.

    The image below shows the data and its updated counterpart.

    As can be seen, several users have changed their plan. If attempting to analyse this data, especially on a large scale, it might be useful to find only users that have not recently changed their data to ensure a more stable dataset.

    When setting up the component parameters, we choose to cast types because it is preferred to try to reconcile the data than to skip the record; however, it is unlikely to be important since we know these data are of common types.

    Finally, we can take a sample of the data output from Intersect and find the common rows.


    What's Next