Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When the to_timestamp_millis function is used and the output format is csv, it will panic #4947

Closed
ZuoTiJia opened this issue Jan 17, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@ZuoTiJia
Copy link
Contributor

ZuoTiJia commented Jan 17, 2023

To Reproduce

datafusion-cli --format csv
DataFusion CLI v16.0.0
❯ SELECT to_timestamp_millis(1926632005177685347);
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', arrow-csv-29.0.0/src/writer.rs:253:28
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The bug of csv writer occurs when unwrap() is called during conversion, which seems to be a bug of arrow-csv.
Expected behavior
Throw an error, no panic

@ZuoTiJia ZuoTiJia added the bug Something isn't working label Jan 17, 2023
@comphead
Copy link
Contributor

comphead commented Jan 17, 2023

@ZuoTiJia
It looks like None returns because the real error is supressed, if I run it without CSV format

DataFusion CLI v16.0.0
❯ select to_timestamp_millis(1926632005177685347);
+-----------------------------------------------+
| totimestampmillis(Int64(1926632005177685347)) |
+-----------------------------------------------+
| ERROR CONVERTING DATE                         |
+-----------------------------------------------+

@ZuoTiJia
Copy link
Contributor Author

@ZuoTiJia It looks like None returns because the real error is supressed, if I run it without CSV format

DataFusion CLI v16.0.0
❯ select to_timestamp_millis(1926632005177685347);
+-----------------------------------------------+
| totimestampmillis(Int64(1926632005177685347)) |
+-----------------------------------------------+
| ERROR CONVERTING DATE                         |
+-----------------------------------------------+

Mixing correct and incorrect results is not good, it would be nice to be able to throw an error.

DataFusion CLI v16.0.0
❯ CREATE TABLE test(time BIGINT) AS VALUES (1), (1926632005177685347);
0 rows in set. Query took 0.006 seconds.
❯ select * from test;
+---------------------+
| time                |
+---------------------+
| 1                   |
| 1926632005177685347 |
+---------------------+
2 rows in set. Query took 0.003 seconds.
❯ select to_timestamp_millis(time) from test;
+------------------------------+
| totimestampmillis(test.time) |
+------------------------------+
| 1970-01-01T00:00:00.001      |
| ERROR CONVERTING DATE        |
+------------------------------+
2 rows in set. Query took 0.003 seconds.
❯

@comphead
Copy link
Contributor

Agree, in old spark versions, if timestamp cast cannot be done then null returned that was really confusing, we should consider a runtime error if such invalid cast happened. I'll fix it in apache/arrow-rs#3547 soon

@comphead
Copy link
Contributor

comphead commented Jan 23, 2023

Waiting apache/arrow-rs#3584 and apache/arrow-rs#3514

@comphead
Copy link
Contributor

comphead commented Mar 9, 2023

@ZuoTiJia The initial problem is solved.

./target/debug/datafusion-cli --format csv

DataFusion CLI v19.0.0
❯ select to_timestamp_millis(1926632005177685347);
Arrow error: Csv error: Error formatting row 1 and column 1: Cast error: Failed to convert 1926632005177685347 to datetime for Timestamp(Millisecond, None)
❯ 

Another problem still in place is to remove mixed values, lets do that in separate issue if needed

@alamb
Copy link
Contributor

alamb commented Nov 2, 2024

(venv-310) andrewlamb@Andrews-MacBook-Pro-2:~/Software/arrow-rs$ datafusion-cli --format csv
DataFusion CLI v42.1.0
> SELECT to_timestamp_millis(1926632005177685347);
to_timestamp_millis(Int64(1926632005177685347))
Arrow error: Csv error: Error processing row 1, col 1: Cast error: Failed to convert 1926632005177685347 to datetime for Timestamp(Millisecond, None)
>

Seems like the panic is fixed and the code is working as expected

@alamb
Copy link
Contributor

alamb commented Nov 2, 2024

Thanks (again) to @drauschenbach for pointing this out

@alamb alamb closed this as completed Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants