-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
When using the SQL API, Cube's rolling window logic is only triggered if the query explicitly applies a date_trunc function to a time dimension in the SELECT clause.
If a time dimension is selected directly in a GROUP BY query without a date_trunc wrapper, its measure's rolling_window definition is applied incorrectly, leading to incorrect results. This happens even if the selected time dimension is already defined with DATE_TRUNC in the data model.
This bug affects both the legacy and Tesseract query engines:
- The legacy engine misinterprets the
rolling_windowparameters, applying them to an incorrect cumulative query. - The Tesseract engine appears to ignore the
rolling_windowdefinition entirely, generating a simpleGROUP BYquery.
To Reproduce
-
Define the following Cube schema:
cubes: - name: test_events sql: | ( SELECT 1 AS user_id, TIMESTAMP '2024-01-01 10:00:00' AS created_ts UNION ALL SELECT 2 AS user_id, TIMESTAMP '2024-01-02 14:00:00' AS created_ts UNION ALL SELECT 1 AS user_id, TIMESTAMP '2024-01-15 08:00:00' AS created_ts UNION ALL SELECT 3 AS user_id, TIMESTAMP '2024-02-20 11:00:00' AS created_ts ) public: true dimensions: - name: user_id sql: "{CUBE}.user_id" type: number primary_key: true - name: created_ts sql: "{CUBE}.created_ts" type: time primary_key: true - name: created_date sql: "DATE_TRUNC('day', {CUBE}.created_ts)" type: time measures: - name: user_count_rolling_28d sql: "{CUBE}.user_id" type: count rolling_window: trailing: 28 day
-
Via the Cube SQL API, execute the following queries to observe the difference.
Failing Query (No
date_truncinSELECT)This query selects the
created_datedimension directly.SELECT created_date, measure(user_count_rolling_28d) FROM test_events WHERE created_date >= '2024-01-01' AND created_date < '2024-03-01' GROUP BY 1
Working Query (With
date_truncinSELECT)This query explicitly wraps
created_datewithdate_trunc. This workaround correctly triggers the rolling window logic.SELECT date_trunc('day', created_date) as created_date_truncated, measure(user_count_rolling_28d) FROM test_events WHERE created_date >= '2024-01-01' AND created_date < '2024-03-01' GROUP BY 1
Actual Results (Incorrect Behavior)
The first (failing) query produces incorrect SQL. The behavior differs between the legacy and Tesseract engines, but neither performs a rolling window calculation.
Without Tesseract (Legacy Engine)
The engine generates a complex, non-rolling cumulative query, ignoring the rolling_window definition.
-- Incorrect SQL
SELECT
q_0."created_date",
"test_events__user_count_rolling_28d" "user_count_rolli"
FROM
(
SELECT
CAST(
date_add(
'minute',
timezone_minute(
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
),
date_add(
'hour',
timezone_hour(
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
),
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts)
)
) AS TIMESTAMP
) "created_date",
count("test_events_user_count_rolling_28d_cumulative__test_events".user_id) "test_events__user_count_rolling_28d"
FROM
(
(
SELECT
1 AS user_id,
TIMESTAMP '2024-01-01 10:00:00' AS created_ts
UNION ALL
SELECT
2 AS user_id,
TIMESTAMP '2024-01-02 14:00:00' AS created_ts
UNION ALL
SELECT
1 AS user_id,
TIMESTAMP '2024-01-15 08:00:00' AS created_ts
UNION ALL
SELECT
3 AS user_id,
TIMESTAMP '2024-02-20 11:00:00' AS created_ts
)
) AS "test_events_user_count_rolling_28d_cumulative__test_events"
WHERE
(
CAST(
date_add(
'minute',
timezone_minute(
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
),
date_add(
'hour',
timezone_hour(
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
),
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts)
)
) AS TIMESTAMP
) > CAST(
date_add(
'minute',
timezone_minute(from_iso8601_timestamp(?) AT TIME ZONE 'UTC'),
date_add(
'hour',
timezone_hour(from_iso8601_timestamp(?) AT TIME ZONE 'UTC'),
from_iso8601_timestamp(?)
)
) AS TIMESTAMP
) - interval '28' day
AND CAST(
date_add(
'minute',
timezone_minute(
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
),
date_add(
'hour',
timezone_hour(
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
),
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts)
)
) AS TIMESTAMP
) <= CAST(
date_add(
'minute',
timezone_minute(from_iso8601_timestamp(?) AT TIME ZONE 'UTC'),
date_add(
'hour',
timezone_hour(from_iso8601_timestamp(?) AT TIME ZONE 'UTC'),
from_iso8601_timestamp(?)
)
) AS TIMESTAMP
)
)
GROUP BY
1
) as q_0With Tesseract
The engine generates a simple GROUP BY query, completely ignoring the rolling_window definition.
WITH
cte_0 AS (
SELECT
DATE_TRUNC('day', "test_events".created_ts) "user_count_rolling_28d__created_date",
count("test_events".user_id) "test_events__user_count_rolling_28d"
FROM
(
(
SELECT
1 AS user_id,
TIMESTAMP '2024-01-01 10:00:00' AS created_ts
UNION ALL
SELECT
2 AS user_id,
TIMESTAMP '2024-01-02 14:00:00' AS created_ts
UNION ALL
SELECT
1 AS user_id,
TIMESTAMP '2024-01-15 08:00:00' AS created_ts
UNION ALL
SELECT
3 AS user_id,
TIMESTAMP '2024-02-20 11:00:00' AS created_ts
)
) AS "test_events"
WHERE
(
DATE_TRUNC('day', "test_events".created_ts) >= from_iso8601_timestamp(?)
AND DATE_TRUNC('day', "test_events".created_ts) <= from_iso8601_timestamp(?)
)
GROUP BY
1
ORDER BY
2 DESC
)
SELECT
"q_0"."user_count_rolling_28d__created_date" "user_count_rolling_28d__created_date",
"q_0"."test_events__user_count_rolling_28d" "user_count_rolling_28d__user_count_rolling_28d"
FROM
cte_0 AS "q_0"Expected Behavior
The first, intuitive SQL API query should trigger the rolling window logic without needing a redundant date_trunc wrapper. Both engines should generate the standard, correct SQL for a rolling window calculation, which involves a date-series CTE and a proper rolling join, as shown below (these are the actual results from the "Working Query").
With Tesseract:
WITH
time_series AS (
SELECT
d AS date_from,
date_add('MILLISECOND', -1, d + interval '1' day) AS date_to
FROM
UNNEST (
SEQUENCE(
CAST(from_iso8601_timestamp('2024-01-01T00:00:00.001') AS TIMESTAMP),
CAST(from_iso8601_timestamp('2024-02-29T23:59:59.999') AS TIMESTAMP),
INTERVAL '1' day
)
) AS dates (d)
),
cte_1 AS (
SELECT
date_trunc(
'day',
CAST(
date_add(
'minute',
timezone_minute(DATE_TRUNC('day', "test_events".created_ts) AT TIME ZONE 'UTC'),
date_add(
'hour',
timezone_hour(DATE_TRUNC('day', "test_events".created_ts) AT TIME ZONE 'UTC'),
DATE_TRUNC('day', "test_events".created_ts)
)
) AS TIMESTAMP
)
) "user_count_rolling_28d__created_date_day",
count("test_events".user_id) "test_events__user_count_rolling_28d"
FROM
(
(
SELECT
1 AS user_id,
TIMESTAMP '2024-01-01 10:00:00' AS created_ts
UNION ALL
SELECT
2 AS user_id,
TIMESTAMP '2024-01-02 14:00:00' AS created_ts
UNION ALL
SELECT
1 AS user_id,
TIMESTAMP '2024-01-15 08:00:00' AS created_ts
UNION ALL
SELECT
3 AS user_id,
TIMESTAMP '2024-02-20 11:00:00' AS created_ts
)
) AS "test_events"
WHERE
(
CAST(
date_add(
'minute',
timezone_minute(DATE_TRUNC('day', "test_events".created_ts) AT TIME ZONE 'UTC'),
date_add(
'hour',
timezone_hour(DATE_TRUNC('day', "test_events".created_ts) AT TIME ZONE 'UTC'),
DATE_TRUNC('day', "test_events".created_ts)
)
) AS TIMESTAMP
) >= (
SELECT
min("date_from") "value"
FROM
time_series
) - interval '28' day
AND CAST(
date_add(
'minute',
timezone_minute(DATE_TRUNC('day', "test_events".created_ts) AT TIME ZONE 'UTC'),
date_add(
'hour',
timezone_hour(DATE_TRUNC('day', "test_events".created_ts) AT TIME ZONE 'UTC'),
DATE_TRUNC('day', "test_events".created_ts)
)
) AS TIMESTAMP
) <= (
SELECT
max("date_to") "value"
FROM
time_series
)
)
GROUP BY
1
ORDER BY
1 ASC
),
cte_2 AS (
SELECT
"time_series"."date_from" "user_count_rolling_28d__created_date_day",
sum("rolling_source"."test_events__user_count_rolling_28d") "test_events__user_count_rolling_28d"
FROM
time_series AS "time_series"
LEFT JOIN cte_1 AS "rolling_source" ON "rolling_source"."user_count_rolling_28d__created_date_day" > "time_series"."date_to" - interval '28' day
AND "rolling_source"."user_count_rolling_28d__created_date_day" <= "time_series"."date_to"
GROUP BY
1
ORDER BY
1 ASC
)
SELECT
"q_0"."user_count_rolling_28d__created_date_day" "user_count_rolling_28d__created_date_day",
"q_0"."test_events__user_count_rolling_28d" "user_count_rolling_28d__user_count_rolling_28d"
FROM
cte_2 AS "q_0"Without Tesseract:
SELECT
q_0."created_date",
"test_events__user_count_rolling_28d" "user_count_rolli"
FROM
(
SELECT
"user_count_rolling_28d.created_date_series"."date_from" "created_date",
sum("test_events__user_count_rolling_28d") "test_events__user_count_rolling_28d"
FROM (select '2024-01-01T00:00:00.000' f, '2024-01-01T23:59:59.999' t UNION ALL select '2024-01-02T00:00:00.000' f, '2024-01-02T23:59:59.999' t UNION ALL select '2024-01-03T00:00:00.000' f, '2024-01-03T23:59:59.999' t UNION ALL select '2024-01-04T00:00:00.000' f, '2024-01-04T23:59:59.999' t UNION ALL select '2024-01-05T00:00:00.000' f, '2024-01-05T23:59:59.999' t UNION ALL select '2024-01-06T00:00:00.000' f, '2024-01-06T23:59:59.999' t UNION ALL select '2024-01-07T00:00:00.000' f, '2024-01-07T23:59:59.999' t UNION ALL select '2024-01-08T00:00:00.000' f, '2024-01-08T23:59:59.999' t UNION ALL select '2024-01-09T00:00:00.000' f, '2024-01-09T23:59:59.999' t UNION ALL select '2024-01-10T00:00:00.000' f, '2024-01-10T23:59:59.999' t UNION ALL select '2024-01-11T00:00:00.000' f, '2024-01-11T23:59:59.999' t UNION ALL select '2024-01-12T00:00:00.000' f, '2024-01-12T23:59:59.999' t UNION ALL select '2024-01-13T00:00:00.000' f, '2024-01-13T23:59:59.999' t UNION ALL select '2024-01-14T00:00:00.000' f, '2024-01-14T23:59:59.999' t UNION ALL select '2024-01-15T00:00:00.000' f, '2024-01-15T23:59:59.999' t UNION ALL select '2024-01-16T00:00:00.000' f, '2024-01-16T23:59:59.999' t UNION ALL select '2024-01-17T00:00:00.000' f, '2024-01-17T23:59:59.999' t UNION ALL select '2024-01-18T00:00:00.000' f, '2024-01-18T23:59:59.999' t UNION ALL select '2024-01-19T00:00:00.000' f, '2024-01-19T23:59:59.999' t UNION ALL select '2024-01-20T00:00:00.000' f, '2024-01-20T23:59:59.999' t UNION ALL select '2024-01-21T00:00:00.000' f, '2024-01-21T23:59:59.999' t UNION ALL select '2024-01-22T00:00:00.000' f, '2024-01-22T23:59:59.999' t UNION ALL select '2024-01-23T00:00:00.000' f, '2024-01-23T23:59:59.999' t UNION ALL select '2024-01-24T00:00:00.000' f, '2024-01-24T23:59:59.999' t UNION ALL select '2024-01-25T00:00:00.000' f, '2024-01-25T23:59:59.999' t UNION ALL select '2024-01-26T00:00:00.000' f, '2024-01-26T23:59:59.999' t UNION ALL select '2024-01-27T00:00:00.000' f, '2024-01-27T23:59:59.999' t UNION ALL select '2024-01-28T00:00:00.000' f, '2024-01-28T23:59:59.999' t UNION ALL select '2024-01-29T00:00:00.000' f, '2024-01-29T23:59:59.999' t UNION ALL select '2024-01-30T00:00:00.000' f, '2024-01-30T23:59:59.999' t UNION ALL select '2024-01-31T00:00:00.000' f, '2024-01-31T23:59:59.999' t UNION ALL select '2024-02-01T00:00:00.000' f, '2024-02-01T23:59:59.999' t UNION ALL select '2024-02-02T00:00:00.000' f, '2024-02-02T23:59:59.999' t UNION ALL select '2024-02-03T00:00:00.000' f, '2024-02-03T23:59:59.999' t UNION ALL select '2024-02-04T00:00:00.000' f, '2024-02-04T23:59:59.999' t UNION ALL select '2024-02-05T00:00:00.000' f, '2024-02-05T23:59:59.999' t UNION ALL select '2024-02-06T00:00:00.000' f, '2024-02-06T23:59:59.999' t UNION ALL select '2024-02-07T00:00:00.000' f, '2024-02-07T23:59:59.999' t UNION ALL select '2024-02-08T00:00:00.000' f, '2024-02-08T23:59:59.999' t UNION ALL select '2024-02-09T00:00:00.000' f, '2024-02-09T23:59:59.999' t UNION ALL select '2024-02-10T00:00:00.000' f, '2024-02-10T23:59:59.999' t UNION ALL select '2024-02-11T00:00:00.000' f, '2024-02-11T23:59:59.999' t UNION ALL select '2024-02-12T00:00:00.000' f, '2024-02-12T23:59:59.999' t UNION ALL select '2024-02-13T00:00:00.000' f, '2024-02-13T23:59:59.999' t UNION ALL select '2024-02-14T00:00:00.000' f, '2024-02-14T23:59:59.999' t UNION ALL select '2024-02-15T00:00:00.000' f, '2024-02-15T23:59:59.999' t UNION ALL select '2024-02-16T00:00:00.000' f, '2024-02-16T23:59:59.999' t UNION ALL select '2024-02-17T00:00:00.000' f, '2024-02-17T23:59:59.999' t UNION ALL select '2024-02-18T00:00:00.000' f, '2024-02-18T23:59:59.999' t UNION ALL select '2024-02-19T00:00:00.000' f, '2024-02-19T23:59:59.999' t UNION ALL select '2024-02-20T00:00:00.000' f, '2024-02-20T23:59:59.999' t UNION ALL select '2024-02-21T00:00:00.000' f, '2024-02-21T23:59:59.999' t UNION ALL select '2024-02-22T00:00:00.000' f, '2024-02-22T23:59:59.999' t UNION ALL select '2024-02-23T00:00:00.000' f, '2024-02-23T23:59:59.999' t UNION ALL select '2024-02-24T00:00:00.000' f, '2024-02-24T23:59:59.999' t UNION ALL select '2024-02-25T00:00:00.000' f, '2024-02-25T23:59:59.999' t UNION ALL select '2024-02-26T00:00:00.000' f, '2024-02-26T23:59:59.999' t UNION ALL select '2024-02-27T00:00:00.000' f, '2024-02-27T23:59:59.999' t UNION ALL select '2024-02-28T00:00:00.000' f, '2024-02-28T23:59:59.999' t UNION ALL select '2024-02-29T00:00:00.000' f, '2024-02-29T23:59:59.999' t) AS dates) AS "user_count_rolling_28d.created_date_series"
LEFT JOIN (
SELECT
date_trunc(
'day',
CAST(
date_add(
'minute',
timezone_minute(
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
),
date_add(
'hour',
timezone_hour(
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
),
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts)
)
) AS TIMESTAMP
)
) "created_date",
count("test_events_user_count_rolling_28d_cumulative__test_events".user_id) "test_events__user_count_rolling_28d"
FROM
(
(
SELECT
1 AS user_id,
TIMESTAMP '2024-01-01 10:00:00' AS created_ts
UNION ALL
SELECT
2 AS user_id,
TIMESTAMP '2024-01-02 14:00:00' AS created_ts
UNION ALL
SELECT
1 AS user_id,
TIMESTAMP '2024-01-15 08:00:00' AS created_ts
UNION ALL
SELECT
3 AS user_id,
TIMESTAMP '2024-02-20 11:00:00' AS created_ts
)
) AS "test_events_user_count_rolling_28d_cumulative__test_events"
WHERE
(
CAST(
date_add(
'minute',
timezone_minute(
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
),
date_add(
'hour',
timezone_hour(
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
),
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts)
)
) AS TIMESTAMP
) > from_iso8601_timestamp(?) - interval '28' day
AND CAST(
date_add(
'minute',
timezone_minute(
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
),
date_add(
'hour',
timezone_hour(
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts) AT TIME ZONE 'UTC'
),
DATE_TRUNC('day', "test_events_user_count_rolling_28d_cumulative__test_events".created_ts)
)
) AS TIMESTAMP
) <= from_iso8601_timestamp(?)
)
GROUP BY
1
) AS "test_events_user_count_rolling_28d_cumulative__base" ON "test_events_user_count_rolling_28d_cumulative__base"."created_date" > "user_count_rolling_28d.created_date_series"."date_to" - interval '28' day
AND "test_events_user_count_rolling_28d_cumulative__base"."created_date" <= "user_count_rolling_28d.created_date_series"."date_to"
GROUP BY
1
) as q_0Version
1.3.39
Additional context
The query planner in both the legacy and Tesseract engines seems to have a rigid condition for activating rolling window logic. It appears to be strictly dependent on the presence of a date_trunc() function call in the SQL query's SELECT list. It does not analyze the properties of the selected dimension itself to determine if it already has the correct granularity. Consequently, when the date_trunc() function is absent from the query, the planner takes the wrong path and fails to apply the rolling window.