Distinct
    • Dark
      Light

    Distinct

    • Dark
      Light

    Article Summary

    Distinct Component

    Pass on only the distinct (or unique) input records to the next component.


    Properties

    Snowflake Properties

    PropertySettingDescription
    NameTextA human-readable name for the component.
    ColumnsSelectionOnly these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values.

    Redshift Properties

    PropertySettingDescription
    NameTextA human-readable name for the component.
    ColumnsSelectionOnly these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values.
    Note: On Redshift, since records may be located on many nodes, determining uniqueness can be expensive.

    BigQuery Properties

    PropertySettingDescription
    NameTextA human-readable name for the component.
    ColumnsSelectionOnly these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values.

    Synapse Properties

    PropertySettingDescription
    NameTextA human-readable name for the component.
    ColumnsSelectionOnly these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values.
    PropertySettingDescription
    NameStringA human-readable name for the component.
    ColumnsColumn SelectOnly these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values.

    Strategy

    Generates a select distinct query.

    Example

    In this example, we have two tables of data that contain project tasks from past months. We want to bring these tables together using a Unite component so that we have just 1 complete table of data. Unfortunately, the dates from the past data tables have some overlap and this means we end up with duplicate rows when uniting the two tables. In order to solve this problem, we use a Distinct component to filter out any duplicate rows. The Transformation job is shown below.

    Below we note the number of rows in each input table with the 3rd row count (inside the Unite component) being the sum of these counts. A quick way of checking the success of the Distinct component will be a reduction in row count as duplicates are removed.

    In the Distinct component properties, all columns are added so that they all appear in the output.

    When run, this component will take in these columns from the input, keep the unique rows and output them. The resulting sample and row count is shown below.