Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'array_repeat' if the repeat count value is 0, return NULL instead of empty array #14046

Closed

Conversation

jatin510
Copy link
Contributor

@jatin510 jatin510 commented Jan 8, 2025

Which issue does this PR close?

Closes #13872.

Rationale for this change

We noticed that,
Array of null: []
Empty array: []

> select array_repeat(null, 0);
+-----------------------------+
| array_repeat(NULL,Int64(0)) |
+-----------------------------+
| []                          |
+-----------------------------+
1 row(s) fetched. 
Elapsed 0.006 seconds.

> select array_repeat(null, 1);
+-----------------------------+
| array_repeat(NULL,Int64(1)) |
+-----------------------------+
| []                          |
+-----------------------------+

Appears same

To distinguish both,
Updated the empty array to return null instead
Array of null: []
Empty array will return: null
Now,

> select array_repeat(null, 0);
+-----------------------------+
| array_repeat(NULL,Int64(0)) |
+-----------------------------+
|                             |
+-----------------------------+
1 row(s) fetched. 
Elapsed 0.025 seconds.

> select array_repeat(null, 1);
+-----------------------------+
| array_repeat(NULL,Int64(1)) |
+-----------------------------+
| []                          |
+-----------------------------+

What changes are included in this PR?

Are these changes tested?

Yes,

Updated the array.slt file

Are there any user-facing changes?

Yes

It has breaking change. Please add the label api change

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 8, 2025
@jonahgao
Copy link
Member

jonahgao commented Jan 9, 2025

I think we should fix it on the display/formatting side. For example, we still cannot distinguish:

DataFusion CLI v44.0.0

> select array[], array[null];
+--------------+------------------+
| make_array() | make_array(NULL) |
+--------------+------------------+
| []           | []               |
+--------------+------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.

See #13872 (comment)

@jatin510
Copy link
Contributor Author

jatin510 commented Jan 9, 2025

I think we should fix it on the display/formatting side. For example, we still cannot distinguish:

DataFusion CLI v44.0.0

> select array[], array[null];
+--------------+------------------+
| make_array() | make_array(NULL) |
+--------------+------------------+
| []           | []               |
+--------------+------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.

See #13872 (comment)

makes sense,
should we start showing NULL for the null values ?
It will make things easy for us. Even the postgres, apache spark and others shows, NULL for null values.

> select array[], array[null];
+--------------+------------------+
| make_array() | make_array(NULL) |
+--------------+------------------+
| []           | [NULL]           |
+--------------+------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.

This will make changes everywhere.
I think we should start showing NULL for null values.

@jayzhan211 @jonahgao

@jayzhan211
Copy link
Contributor

jayzhan211 commented Jan 9, 2025

I think we should fix it on the display/formatting side. For example, we still cannot distinguish:

DataFusion CLI v44.0.0

> select array[], array[null];
+--------------+------------------+
| make_array() | make_array(NULL) |
+--------------+------------------+
| []           | []               |
+--------------+------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.

See #13872 (comment)

makes sense, should we start showing NULL for the null values ? It will make things easy for us. Even the postgres, apache spark and others shows, NULL for null values.

> select array[], array[null];
+--------------+------------------+
| make_array() | make_array(NULL) |
+--------------+------------------+
| []           | [NULL]           |
+--------------+------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.

This will make changes everywhere. I think we should start showing NULL for null values.

@jayzhan211 @jonahgao

Display nulls looks fine to me

@jatin510
Copy link
Contributor Author

jatin510 commented Jan 9, 2025

Why haven’t we been displaying null values as NULL so far?
What was the original reasoning or intention behind this decision?

@jonahgao
Copy link
Member

jonahgao commented Jan 9, 2025

Why haven’t we been displaying null values as NULL so far? What was the original reasoning or intention behind this decision?

I guess it's to follow PostgreSQL CLI. PostgreSQL CLI does not display null values as well, but it always displays null elements of arrays as NULL.

$ psql
psql (16.6 (Ubuntu 16.6-0ubuntu0.24.04.1))
Type "help" for help.

psql=> select null, array[null, 1, null];
 ?column? |     array
----------+---------------
          | {NULL,1,NULL}
(1 row)

psql=> \pset null mynull
Null display is "mynull".
psql=> select null, array[null, 1, null];
 ?column? |     array
----------+---------------
 mynull   | {NULL,1,NULL}
(1 row)

I think we can follow PostgreSQL more thoroughly, or always display null, regardless of whether it is a list element.
The latter can be achieved by setting FormatOptions.

The fix should not involve the DataFusion core; it should be related to DataFusion CLI and sqllogictests.
Some displays of DataFusion CLI are not user-friendly, for example

DataFusion CLI v44.0.0

> select array[null, null, null,1,null];
+------------------------------------------+
| make_array(NULL,NULL,NULL,Int64(1),NULL) |
+------------------------------------------+
| [, , , 1, ]                              |
+------------------------------------------+
1 row(s) fetched.
Elapsed 0.006 seconds.

@jatin510
Copy link
Contributor Author

Created a different PR as this is related to specific array udf.
Introducing default cli formatter for sqllogic test and df-cli : #14052

Thanks, @jonahgao @jayzhan211 for your valuable feedback.

@jatin510 jatin510 closed this Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Functionality of array_repeat udf
3 participants