Distinct

Dark
Light

Article Summary

Share feedback

Thanks for sharing your feedback!

Distinct Component

Pass on only the distinct (or unique) input records to the next component.

Properties

Snowflake Properties
Property	Setting	Description
Name	Text	A human-readable name for the component.
Columns	Selection	Only these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values.

Redshift Properties
Property	Setting	Description
Name	Text	A human-readable name for the component.
Columns	Selection	Only these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values. Note: On Redshift, since records may be located on many nodes, determining uniqueness can be expensive.

BigQuery Properties
Property	Setting	Description
Name	Text	A human-readable name for the component.
Columns	Selection	Only these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values.

Synapse Properties
Property	Setting	Description
Name	Text	A human-readable name for the component.
Columns	Selection	Only these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values.

Property	Setting	Description
Name	String	A human-readable name for the component.
Columns	Column Select	Only these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values.

Strategy

Generates a select distinct query.

Example

In this example, we have two tables of data that contain project tasks from past months. We want to bring these tables together using a Unite component so that we have just 1 complete table of data. Unfortunately, the dates from the past data tables have some overlap and this means we end up with duplicate rows when uniting the two tables. In order to solve this problem, we use a Distinct component to filter out any duplicate rows. The Transformation job is shown below.

Below we note the number of rows in each input table with the 3rd row count (inside the Unite component) being the sum of these counts. A quick way of checking the success of the Distinct component will be a reduction in row count as duplicates are removed.

In the Distinct component properties, all columns are added so that they all appear in the output.

When run, this component will take in these columns from the input, keep the unique rows and output them. The resulting sample and row count is shown below.

What's Next

Extract Nested Data

Table of contents

Distinct Component
Properties
Snowflake Properties
Redshift Properties
BigQuery Properties
Synapse Properties
Strategy
Example

Distinct

Distinct Component

Properties

Snowflake Properties

Redshift Properties

BigQuery Properties

Synapse Properties

Strategy

Example

What's Next