Skip to content

rolling_window fails when grouping by a time dimension without date_trunc #9899

@nandresen-stripe

Description

@nandresen-stripe

Describe the bug

When using the SQL API, Cube's rolling window logic is only triggered if the query explicitly applies a date_trunc function to a time dimension in the SELECT clause.

If a time dimension is selected directly in a GROUP BY query without a date_trunc wrapper, its measure's rolling_window definition is applied incorrectly, leading to incorrect results. This happens even if the selected time dimension is already defined with DATE_TRUNC in the data model.

This bug affects both the legacy and Tesseract query engines:

  • The legacy engine misinterprets the rolling_window parameters, applying them to an incorrect cumulative query.
  • The Tesseract engine appears to ignore the rolling_window definition entirely, generating a simple GROUP BY query.

To Reproduce

  1. Define the following Cube schema:

    cubes:
      - name: test_events
        sql: |
          (
            SELECT 1 AS user_id, TIMESTAMP '2024-01-01 10:00:00' AS created_ts
            UNION ALL
            SELECT 2 AS user_id, TIMESTAMP '2024-01-02 14:00:00' AS created_ts
            UNION ALL
            SELECT 1 AS user_id, TIMESTAMP '2024-01-15 08:00:00' AS created_ts
            UNION ALL
            SELECT 3 AS user_id, TIMESTAMP '2024-02-20 11:00:00' AS created_ts
          )
        public: true
    
        dimensions:
          - name: user_id
            sql: "{CUBE}.user_id"
            type: number
            primary_key: true
    
          - name: created_ts
            sql: "{CUBE}.created_ts"
            type: time
            primary_key: true
    
          - name: created_date
            sql: "DATE_TRUNC('day', {CUBE}.created_ts)"
            type: time
    
        measures:
          - name: user_count_rolling_28d
            sql: "{CUBE}.user_id"
            type: count
            rolling_window:
              trailing: 28 day
  2. Via the Cube SQL API, execute the following queries to observe the difference.

    Failing Query (No date_trunc in SELECT)

    This query selects the created_date dimension directly.

    SELECT
      created_date,
      measure(user_count_rolling_28d)
    FROM test_events
    WHERE
      created_date >= '2024-01-01' AND created_date < '2024-03-01'
    GROUP BY 1

    Working Query (With date_trunc in SELECT)

    This query explicitly wraps created_date with date_trunc. This workaround correctly triggers the rolling window logic.

    SELECT
      date_trunc('day', created_date) as created_date_truncated,
      measure(user_count_rolling_28d)
    FROM test_events
    WHERE
      created_date >= '2024-01-01' AND created_date < '2024-03-01'
    GROUP BY 1

Actual Results (Incorrect Behavior)

The first (failing) query produces incorrect SQL. The behavior differs between the legacy and Tesseract engines, but neither performs a rolling window calculation.

Without Tesseract (Legacy Engine)

The engine generates a complex, non-rolling cumulative query, ignoring the rolling_window definition.

-- Incorrect SQL
SELECT
  q_0."created_date",
  "test_events__user_count_rolling_28d" "user_count_rolli"
FROM
  (
    SELECT
      CAST(
        date_add(
          'minute',
          timezone_minute(
            DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
          ),
          date_add(
            'hour',
            timezone_hour(
              DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
            ),
            DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts)
          )
        ) AS TIMESTAMP
      ) "created_date",
      count("test_events_user_count_rolling_28d_cumulative__test_events".user_id) "test_events__user_count_rolling_28d"
    FROM
      (
        (
          SELECT
            1 AS user_id,
            TIMESTAMP '2024-01-01 10:00:00' AS created_ts
          UNION ALL
          SELECT
            2 AS user_id,
            TIMESTAMP '2024-01-02 14:00:00' AS created_ts
          UNION ALL
          SELECT
            1 AS user_id,
            TIMESTAMP '2024-01-15 08:00:00' AS created_ts
          UNION ALL
          SELECT
            3 AS user_id,
            TIMESTAMP '2024-02-20 11:00:00' AS created_ts
        )
      ) AS "test_events_user_count_rolling_28d_cumulative__test_events"
    WHERE
      (
        CAST(
          date_add(
            'minute',
            timezone_minute(
              DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
            ),
            date_add(
              'hour',
              timezone_hour(
                DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
              ),
              DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts)
            )
          ) AS TIMESTAMP
        ) > CAST(
          date_add(
            'minute',
            timezone_minute(from_iso8601_timestamp(?) AT TIME ZONE 'UTC'),
            date_add(
              'hour',
              timezone_hour(from_iso8601_timestamp(?) AT TIME ZONE 'UTC'),
              from_iso8601_timestamp(?)
            )
          ) AS TIMESTAMP
        ) - interval '28' day
        AND CAST(
          date_add(
            'minute',
            timezone_minute(
              DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
            ),
            date_add(
              'hour',
              timezone_hour(
                DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
              ),
              DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts)
            )
          ) AS TIMESTAMP
        ) <= CAST(
          date_add(
            'minute',
            timezone_minute(from_iso8601_timestamp(?) AT TIME ZONE 'UTC'),
            date_add(
              'hour',
              timezone_hour(from_iso8601_timestamp(?) AT TIME ZONE 'UTC'),
              from_iso8601_timestamp(?)
            )
          ) AS TIMESTAMP
        )
      )
    GROUP BY
      1
  ) as q_0

With Tesseract

The engine generates a simple GROUP BY query, completely ignoring the rolling_window definition.

WITH
  cte_0 AS (
    SELECT
      DATE_TRUNC('day', "test_events".created_ts) "user_count_rolling_28d__created_date",
      count("test_events".user_id) "test_events__user_count_rolling_28d"
    FROM
      (
        (
          SELECT
            1 AS user_id,
            TIMESTAMP '2024-01-01 10:00:00' AS created_ts
          UNION ALL
          SELECT
            2 AS user_id,
            TIMESTAMP '2024-01-02 14:00:00' AS created_ts
          UNION ALL
          SELECT
            1 AS user_id,
            TIMESTAMP '2024-01-15 08:00:00' AS created_ts
          UNION ALL
          SELECT
            3 AS user_id,
            TIMESTAMP '2024-02-20 11:00:00' AS created_ts
        )
      ) AS "test_events"
    WHERE
      (
        DATE_TRUNC('day', "test_events".created_ts) >= from_iso8601_timestamp(?)
        AND DATE_TRUNC('day', "test_events".created_ts) <= from_iso8601_timestamp(?)
      )
    GROUP BY
      1
    ORDER BY
      2 DESC
  )
SELECT
  "q_0"."user_count_rolling_28d__created_date" "user_count_rolling_28d__created_date",
  "q_0"."test_events__user_count_rolling_28d" "user_count_rolling_28d__user_count_rolling_28d"
FROM
  cte_0 AS "q_0"

Expected Behavior

The first, intuitive SQL API query should trigger the rolling window logic without needing a redundant date_trunc wrapper. Both engines should generate the standard, correct SQL for a rolling window calculation, which involves a date-series CTE and a proper rolling join, as shown below (these are the actual results from the "Working Query").

With Tesseract:

WITH
  time_series AS (
    SELECT
      d AS date_from,
      date_add('MILLISECOND', -1, d + interval '1' day) AS date_to
    FROM
      UNNEST (
        SEQUENCE(
          CAST(from_iso8601_timestamp('2024-01-01T00:00:00.001') AS TIMESTAMP),
          CAST(from_iso8601_timestamp('2024-02-29T23:59:59.999') AS TIMESTAMP),
          INTERVAL '1' day
        )
      ) AS dates (d)
  ),
  cte_1 AS (
    SELECT
      date_trunc(
        'day',
        CAST(
          date_add(
            'minute',
            timezone_minute(DATE_TRUNC('day', "test_events".created_ts) AT TIME ZONE 'UTC'),
            date_add(
              'hour',
              timezone_hour(DATE_TRUNC('day', "test_events".created_ts) AT TIME ZONE 'UTC'),
              DATE_TRUNC('day', "test_events".created_ts)
            )
          ) AS TIMESTAMP
        )
      ) "user_count_rolling_28d__created_date_day",
      count("test_events".user_id) "test_events__user_count_rolling_28d"
    FROM
      (
        (
          SELECT
            1 AS user_id,
            TIMESTAMP '2024-01-01 10:00:00' AS created_ts
          UNION ALL
          SELECT
            2 AS user_id,
            TIMESTAMP '2024-01-02 14:00:00' AS created_ts
          UNION ALL
          SELECT
            1 AS user_id,
            TIMESTAMP '2024-01-15 08:00:00' AS created_ts
          UNION ALL
          SELECT
            3 AS user_id,
            TIMESTAMP '2024-02-20 11:00:00' AS created_ts
        )
      ) AS "test_events"
    WHERE
      (
        CAST(
          date_add(
            'minute',
            timezone_minute(DATE_TRUNC('day', "test_events".created_ts) AT TIME ZONE 'UTC'),
            date_add(
              'hour',
              timezone_hour(DATE_TRUNC('day', "test_events".created_ts) AT TIME ZONE 'UTC'),
              DATE_TRUNC('day', "test_events".created_ts)
            )
          ) AS TIMESTAMP
        ) >= (
          SELECT
            min("date_from") "value"
          FROM
            time_series
        ) - interval '28' day
        AND CAST(
          date_add(
            'minute',
            timezone_minute(DATE_TRUNC('day', "test_events".created_ts) AT TIME ZONE 'UTC'),
            date_add(
              'hour',
              timezone_hour(DATE_TRUNC('day', "test_events".created_ts) AT TIME ZONE 'UTC'),
              DATE_TRUNC('day', "test_events".created_ts)
            )
          ) AS TIMESTAMP
        ) <= (
          SELECT
            max("date_to") "value"
          FROM
            time_series
        )
      )
    GROUP BY
      1
    ORDER BY
      1 ASC
  ),
  cte_2 AS (
    SELECT
      "time_series"."date_from" "user_count_rolling_28d__created_date_day",
      sum("rolling_source"."test_events__user_count_rolling_28d") "test_events__user_count_rolling_28d"
    FROM
      time_series AS "time_series"
      LEFT JOIN cte_1 AS "rolling_source" ON "rolling_source"."user_count_rolling_28d__created_date_day" > "time_series"."date_to" - interval '28' day
      AND "rolling_source"."user_count_rolling_28d__created_date_day" <= "time_series"."date_to"
    GROUP BY
      1
    ORDER BY
      1 ASC
  )
SELECT
  "q_0"."user_count_rolling_28d__created_date_day" "user_count_rolling_28d__created_date_day",
  "q_0"."test_events__user_count_rolling_28d" "user_count_rolling_28d__user_count_rolling_28d"
FROM
  cte_2 AS "q_0"

Without Tesseract:

SELECT
  q_0."created_date",
  "test_events__user_count_rolling_28d" "user_count_rolli"
FROM
  (
    SELECT
      "user_count_rolling_28d.created_date_series"."date_from" "created_date",
      sum("test_events__user_count_rolling_28d") "test_events__user_count_rolling_28d"
    FROM (select '2024-01-01T00:00:00.000' f, '2024-01-01T23:59:59.999' t UNION ALL select '2024-01-02T00:00:00.000' f, '2024-01-02T23:59:59.999' t UNION ALL select '2024-01-03T00:00:00.000' f, '2024-01-03T23:59:59.999' t UNION ALL select '2024-01-04T00:00:00.000' f, '2024-01-04T23:59:59.999' t UNION ALL select '2024-01-05T00:00:00.000' f, '2024-01-05T23:59:59.999' t UNION ALL select '2024-01-06T00:00:00.000' f, '2024-01-06T23:59:59.999' t UNION ALL select '2024-01-07T00:00:00.000' f, '2024-01-07T23:59:59.999' t UNION ALL select '2024-01-08T00:00:00.000' f, '2024-01-08T23:59:59.999' t UNION ALL select '2024-01-09T00:00:00.000' f, '2024-01-09T23:59:59.999' t UNION ALL select '2024-01-10T00:00:00.000' f, '2024-01-10T23:59:59.999' t UNION ALL select '2024-01-11T00:00:00.000' f, '2024-01-11T23:59:59.999' t UNION ALL select '2024-01-12T00:00:00.000' f, '2024-01-12T23:59:59.999' t UNION ALL select '2024-01-13T00:00:00.000' f, '2024-01-13T23:59:59.999' t UNION ALL select '2024-01-14T00:00:00.000' f, '2024-01-14T23:59:59.999' t UNION ALL select '2024-01-15T00:00:00.000' f, '2024-01-15T23:59:59.999' t UNION ALL select '2024-01-16T00:00:00.000' f, '2024-01-16T23:59:59.999' t UNION ALL select '2024-01-17T00:00:00.000' f, '2024-01-17T23:59:59.999' t UNION ALL select '2024-01-18T00:00:00.000' f, '2024-01-18T23:59:59.999' t UNION ALL select '2024-01-19T00:00:00.000' f, '2024-01-19T23:59:59.999' t UNION ALL select '2024-01-20T00:00:00.000' f, '2024-01-20T23:59:59.999' t UNION ALL select '2024-01-21T00:00:00.000' f, '2024-01-21T23:59:59.999' t UNION ALL select '2024-01-22T00:00:00.000' f, '2024-01-22T23:59:59.999' t UNION ALL select '2024-01-23T00:00:00.000' f, '2024-01-23T23:59:59.999' t UNION ALL select '2024-01-24T00:00:00.000' f, '2024-01-24T23:59:59.999' t UNION ALL select '2024-01-25T00:00:00.000' f, '2024-01-25T23:59:59.999' t UNION ALL select '2024-01-26T00:00:00.000' f, '2024-01-26T23:59:59.999' t UNION ALL select '2024-01-27T00:00:00.000' f, '2024-01-27T23:59:59.999' t UNION ALL select '2024-01-28T00:00:00.000' f, '2024-01-28T23:59:59.999' t UNION ALL select '2024-01-29T00:00:00.000' f, '2024-01-29T23:59:59.999' t UNION ALL select '2024-01-30T00:00:00.000' f, '2024-01-30T23:59:59.999' t UNION ALL select '2024-01-31T00:00:00.000' f, '2024-01-31T23:59:59.999' t UNION ALL select '2024-02-01T00:00:00.000' f, '2024-02-01T23:59:59.999' t UNION ALL select '2024-02-02T00:00:00.000' f, '2024-02-02T23:59:59.999' t UNION ALL select '2024-02-03T00:00:00.000' f, '2024-02-03T23:59:59.999' t UNION ALL select '2024-02-04T00:00:00.000' f, '2024-02-04T23:59:59.999' t UNION ALL select '2024-02-05T00:00:00.000' f, '2024-02-05T23:59:59.999' t UNION ALL select '2024-02-06T00:00:00.000' f, '2024-02-06T23:59:59.999' t UNION ALL select '2024-02-07T00:00:00.000' f, '2024-02-07T23:59:59.999' t UNION ALL select '2024-02-08T00:00:00.000' f, '2024-02-08T23:59:59.999' t UNION ALL select '2024-02-09T00:00:00.000' f, '2024-02-09T23:59:59.999' t UNION ALL select '2024-02-10T00:00:00.000' f, '2024-02-10T23:59:59.999' t UNION ALL select '2024-02-11T00:00:00.000' f, '2024-02-11T23:59:59.999' t UNION ALL select '2024-02-12T00:00:00.000' f, '2024-02-12T23:59:59.999' t UNION ALL select '2024-02-13T00:00:00.000' f, '2024-02-13T23:59:59.999' t UNION ALL select '2024-02-14T00:00:00.000' f, '2024-02-14T23:59:59.999' t UNION ALL select '2024-02-15T00:00:00.000' f, '2024-02-15T23:59:59.999' t UNION ALL select '2024-02-16T00:00:00.000' f, '2024-02-16T23:59:59.999' t UNION ALL select '2024-02-17T00:00:00.000' f, '2024-02-17T23:59:59.999' t UNION ALL select '2024-02-18T00:00:00.000' f, '2024-02-18T23:59:59.999' t UNION ALL select '2024-02-19T00:00:00.000' f, '2024-02-19T23:59:59.999' t UNION ALL select '2024-02-20T00:00:00.000' f, '2024-02-20T23:59:59.999' t UNION ALL select '2024-02-21T00:00:00.000' f, '2024-02-21T23:59:59.999' t UNION ALL select '2024-02-22T00:00:00.000' f, '2024-02-22T23:59:59.999' t UNION ALL select '2024-02-23T00:00:00.000' f, '2024-02-23T23:59:59.999' t UNION ALL select '2024-02-24T00:00:00.000' f, '2024-02-24T23:59:59.999' t UNION ALL select '2024-02-25T00:00:00.000' f, '2024-02-25T23:59:59.999' t UNION ALL select '2024-02-26T00:00:00.000' f, '2024-02-26T23:59:59.999' t UNION ALL select '2024-02-27T00:00:00.000' f, '2024-02-27T23:59:59.999' t UNION ALL select '2024-02-28T00:00:00.000' f, '2024-02-28T23:59:59.999' t UNION ALL select '2024-02-29T00:00:00.000' f, '2024-02-29T23:59:59.999' t) AS dates) AS "user_count_rolling_28d.created_date_series"
      LEFT JOIN (
        SELECT
          date_trunc(
            'day',
            CAST(
              date_add(
                'minute',
                timezone_minute(
                  DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
                ),
                date_add(
                  'hour',
                  timezone_hour(
                    DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
                  ),
                  DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts)
                )
              ) AS TIMESTAMP
            )
          ) "created_date",
          count("test_events_user_count_rolling_28d_cumulative__test_events".user_id) "test_events__user_count_rolling_28d"
        FROM
          (
            (
              SELECT
                1 AS user_id,
                TIMESTAMP '2024-01-01 10:00:00' AS created_ts
              UNION ALL
              SELECT
                2 AS user_id,
                TIMESTAMP '2024-01-02 14:00:00' AS created_ts
              UNION ALL
              SELECT
                1 AS user_id,
                TIMESTAMP '2024-01-15 08:00:00' AS created_ts
              UNION ALL
              SELECT
                3 AS user_id,
                TIMESTAMP '2024-02-20 11:00:00' AS created_ts
            )
          ) AS "test_events_user_count_rolling_28d_cumulative__test_events"
        WHERE
          (
            CAST(
              date_add(
                'minute',
                timezone_minute(
                  DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
                ),
                date_add(
                  'hour',
                  timezone_hour(
                    DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
                  ),
                  DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts)
                )
              ) AS TIMESTAMP
            ) > from_iso8601_timestamp(?) - interval '28' day
            AND CAST(
              date_add(
                'minute',
                timezone_minute(
                  DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
                ),
                date_add(
                  'hour',
                  timezone_hour(
                    DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
                  ),
                  DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts)
                )
              ) AS TIMESTAMP
            ) <= from_iso8601_timestamp(?)
          )
        GROUP BY
          1
      ) AS "test_events_user_count_rolling_28d_cumulative__base" ON "test_events_user_count_rolling_28d_cumulative__base"."created_date" > "user_count_rolling_28d.created_date_series"."date_to" - interval '28' day
      AND "test_events_user_count_rolling_28d_cumulative__base"."created_date" <= "user_count_rolling_28d.created_date_series"."date_to"
    GROUP BY
      1
  ) as q_0

Version

1.3.39

Additional context

The query planner in both the legacy and Tesseract engines seems to have a rigid condition for activating rolling window logic. It appears to be strictly dependent on the presence of a date_trunc() function call in the SQL query's SELECT list. It does not analyze the properties of the selected dimension itself to determine if it already has the correct granularity. Consequently, when the date_trunc() function is absent from the query, the planner takes the wrong path and fails to apply the rolling window.

Metadata

Metadata

Assignees

No one assigned

    Labels

    api:sqlIssues related to SQL APIquestionThe issue is a question. Please use Stack Overflow for questions.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions