Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected result when using nth_value #24758

Open
MikeWallis42 opened this issue Jan 21, 2025 · 3 comments
Open

Unexpected result when using nth_value #24758

MikeWallis42 opened this issue Jan 21, 2025 · 3 comments

Comments

@MikeWallis42
Copy link

The documentation states:-

nth_value(x, offset) → [same as input][#](https://trino.io/docs/current/functions/window.html#nth_value)

    Returns the value at the specified offset from the beginning of the window. Offsets start at 1. The offset can be any scalar expression. If the offset is null or greater than the number of values in the window, null is returned. It is an error for the offset to be zero or negative.

With a MRE

SELECT
  id,
  ARBITRARY(is_deleted) OVER w1 AS is_deleted,
  NTH_VALUE(is_deleted, 2) OVER w1 AS was_deleted,
  order_by
FROM (
  VALUES
  (1, FALSE, 1),
  (1, FALSE, 2),
  (1, FALSE, 3),
  (1, TRUE, 4)
)AS t1 (
  id, is_deleted, order_by
)
WINDOW w1 AS(PARTITION BY id ORDER BY order_by DESC)

This returns

id is_deleted was_deleted order_by
1 TRUE NULL 4
1 TRUE FALSE 3
1 TRUE FALSE 2
1 TRUE FALSE 1

But I would expect

id is_deleted was_deleted order_by
1 TRUE FALSE 4
1 TRUE FALSE 3
1 TRUE FALSE 2
1 TRUE FALSE 1

When the NTH_VALUE offset is increased we end up with more NULL values, like the window above the offset is being ignored.

@wendigo
Copy link
Contributor

wendigo commented Jan 21, 2025

@kasiafi can you take a look?

@SongChujun
Copy link
Member

The default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, source: https://trino.io/docs/current/sql/select.html, when we process the first row in the partition, we only have the first row in the current frame, so NTH_VALUE functioncall on offset 2 will return NULL.

In order to avoid this, you need to specify frame as RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, like the following

SELECT
  id,
  ARBITRARY(is_deleted) OVER w1 AS is_deleted,
  NTH_VALUE(is_deleted, 2) OVER w1 AS was_deleted,
  order_by
FROM (
  VALUES
  (1, FALSE, 1),
  (1, FALSE, 2),
  (1, FALSE, 3),
  (1, TRUE, 4)
)AS t1 (
  id, is_deleted, order_by
)
WINDOW w1 AS(PARTITION BY id ORDER BY order_by DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

@MikeWallis42
Copy link
Author

In the absence of ORDER BY, all rows are considered peers, so RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is equivalent to BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

I find it quite odd that there are different behaviours depending on whether you apply ordering or not but it seems consistent with other engines... perhaps an example rather than a description in the documentation could help clear things up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants