Lead-Lag
    • Dark
      Light

    Lead-Lag

    • Dark
      Light

    Article Summary

    This article is specific to the following platforms - Snowflake - Redshift - BigQuery - Synapse.

    Lead/Lag Component

    Allows the user to determine a value from a preceding or following row at a given offset within a group (or partition) of values.


    Properties

    Snowflake Properties

    PropertySettingDescription
    NameTextA human-readable name for the component.
    Include Input ColumnsYes/NoDefines whether the component passes all input columns into the output.
    Partition DataList of ColumnsDefines how the input data is partitioned in order to perform the rank calculation. The calculation is then performed on each partition.
    Ordering within partitionsInput ColumnThe input column name for sorting within the partitioned data. Note: you can drag to reorder
    OrderingThe order of the sorting: Ascending (Asc) or Descending (Desc)
    FunctionsWindow FunctionLead - Returns the column from <offset> rows earlier in the partition.
    More... Lag - Returns the column from <offset> rows later in the partition.
    More...
    Input ColumnThe name of the input column that the lead/lag function will return.
    OffsetThe number of rows to go forward (lead) or backwards (lag) in the partition.
    Output ColumnThe name of the output column that the window function will create.
    Ignore NullsYes/NoDisregard Null values when determining which row to use. Null values do not count toward reaching the offset.

    Redshift Properties

    PropertySettingDescription
    NameTextA human-readable name for the component.
    Include Input ColumnsYes/NoDefines whether the component passes all input columns into the output.
    Partition DataList of ColumnsDefines how the input data is partitioned in order to perform the rank calculation. The calculation is then performed on each partition.
    Ordering within partitionsInput ColumnThe input column name for sorting within the partitioned data. Note: you can drag to reorder
    OrderingThe order of the sorting: Ascending (Asc) or Descending (Desc)
    FunctionsWindow FunctionLead - Returns the column from <offset> rows earlier in the partition. More...
    Lag - Returns the column from <offset> rows later in the partition. More...
    Input ColumnThe name of the input column that the lead/lag function will return.
    OffsetThe number of rows to go forward (lead) or backwards (lag) in the partition.
    Output ColumnThe name of the output column that the window function will create.
    Ignore NullsYes/NoDisregard Null values when determining which row to use. Null values do not count toward reaching the offset.

    BigQuery Properties

    PropertySettingDescription
    NameTextA human-readable name for the component.
    Include Input ColumnsYes/NoDefines whether the component passes all input columns into the output.
    Partition DataList of ColumnsDefines how the input data is partitioned in order to perform the rank calculation. The calculation is then performed on each partition.
    Ordering within partitionsInput ColumnThe input column name for sorting within the partitioned data. Note: you can drag to reorder
    OrderingThe order of the sorting: Ascending (Asc) or Descending (Desc)
    FunctionsWindow FunctionLead - Returns the column from <offset> rows earlier in the partition.
    Lag - Returns the column from <offset> rows later in the partition.
    Input ColumnThe name of the input column that the lead/lag function will return.
    OffsetThe number of rows to go forward (lead) or backwards (lag) in the partition.
    Output ColumnThe name of the output column that the window function will create.

    Synapse Properties

    PropertySettingDescription
    NameTextA human-readable name for the component.
    Include Input ColumnsYes/NoDefines whether the component passes all input columns into the output.
    Partition DataList of ColumnsDefines how the input data is partitioned in order to perform the rank calculation. The calculation is then performed on each partition.
    Ordering within partitionsInput ColumnThe input column name for sorting within the partitioned data. Note: you can drag to reorder
    OrderingThe order of the sorting: Ascending (Asc) or Descending (Desc)
    FunctionsWindow FunctionLead - Returns the column from <offset> rows earlier in the partition.
    Lag - Returns the column from <offset> rows later in the partition.
    Input ColumnThe name of the input column that the lead/lag function will return.
    OffsetThe number of rows to go forward (lead) or backwards (lag) in the partition.
    Output ColumnThe name of the output column that the window function will create.

    Delta Lake Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    Include Input ColumnsSelectDefines whether the component passes all input columns into the output.
    Partition DataColumn SelectDefines how the input data is partitioned in order to perform the rank calculation. The calculation is then performed on each partition.
    Ordering within partitionsInput ColumnThe input column name for sorting within the partitioned data.
    Note: you can drag to reorder.
    OrderingThe order of the sorting: Ascending (Asc) or Descending (Desc)
    FunctionsWindow FunctionLead: returns the column from <offset> rows earlier in the partition.
    Lag: returns the column from <offset> rows later in the partition.
    Input ColumnThe name of the input column that the lead/lag function will return.
    OffsetThe number of rows to go forward (lead) or backwards (lag) in the partition.
    Output ColumnThe name of the output column that the window function will create.

    Strategy

    Generates a select statement with a window function in line using the OVER keyword. More details...
    Generates a select statement with a window function in line using the OVER keyword. More details...
    Generates a select statement with a window function in line using the OVER keyword. More details...

    Example

    This example uses the flight data. It uses an aggregation to calculate the total flight time per day and then uses the lead lag to add the flight time from the prior day and the prior prior day for comparison.

    Note: The aggregation component groups the data by Year, Month and DayOfWeek and totals the airtime per day (see Aggregate Component for more detail)


    The component properties are set up as below. The data is partitioned down to individual months and ordered by day.

    The Functions are set up to use the Lag Window Function. Two new Lag columns are created from sum_airtime called Airtime Prior Day and Airtime Prior Prior Day. The former has an offset of one hence the day before the current row, and the latter an offset of 2, two days before the current row.

    The Sample data shows the new columns added to the result.


    What's Next