Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Fallback happens to vanilla spark if there is a cast from varchar to timestamp #8401

Open
ayushi-agarwal opened this issue Jan 2, 2025 · 6 comments
Labels
enhancement New feature or request

Comments

@ayushi-agarwal
Copy link
Contributor

Description

Currently there is only a support of conversion from date to timestamp -

if (toType->kind() == TypeKind::TIMESTAMP && !input->type()->isDate()) {

An example expression for which fallback happens: cast(from_utc_timestamp(from_unixtime((cast(col as double) / 1000.0), 'yyyy-MM-dd HH:mm:ss'), 'PST') as date)

Is there a plan to support other data types also? @dcoliversun

@ayushi-agarwal ayushi-agarwal added the enhancement New feature or request label Jan 2, 2025
@ayushi-agarwal
Copy link
Contributor Author

ayushi-agarwal commented Jan 3, 2025

I modified the code so that varchar to timestamp cast is also gets pushed to velox. For PST it fails with below error but works fine if I give America/Los_Angeles
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Unknown time zone: 'PST'
Retriable: False
Expression: toTimeZone != nullptr
Context: Top-level Expression: from_utc_timestamp(try_cast((from_unixtime(try_cast((divide(try_cast((n0_0) as DOUBLE), 1000:DOUBLE)) as BIGINT), yyyy-MM-dd HH:mm:ss.SSS :VARCHAR)) as TIMESTAMP), PST:VARCHAR)
Function: call
File: /home/ayushi/workspace/incubator-gluten/ep/build-velox/build/velox_ep/./velox/functions/sparksql/DateTimeFunctions.h
Line: 374
This error might be coming because PST is not present in https://github.com/facebookincubator/velox/blob/main/velox/type/tz/TimeZoneDatabase.cpp
@PHILO-HE Is this the reason we have this validation check in gluten to not offload it to velox?

Also at present timestamp to date cast is also not offloaded but when I tried doing that it gave correct result, shall we enable it?

@FelixYBW
Copy link
Contributor

FelixYBW commented Jan 3, 2025

can #8357 fix it?

@ayushi-agarwal
Copy link
Contributor Author

ayushi-agarwal commented Jan 5, 2025

@FelixYBW I did the same change in my local and it resolves the cast from varchar to timestamp offload issue but then I got the above error for from_utc_timestamp expression when I gave PST as parameter. Any suggestions for this as this would need change in velox?

@FelixYBW
Copy link
Contributor

FelixYBW commented Jan 6, 2025

@FelixYBW I did the same change in my local and it resolves the cast from varchar to timestamp offload issue but then I got the above error for from_utc_timestamp expression when I gave PST as parameter. Any suggestions for this as this would need change in velox?

I see. I don't think we can fix this in Gluten itself.

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Jan 7, 2025

@ayushi-agarwal, I think abbreviated timezone name like "PST" is not supported in Velox, because it's ambiguous. And it is not recommended by Spark (see link).

@ayushi-agarwal
Copy link
Contributor Author

@ayushi-agarwal, I think abbreviated timezone name like "PST" is not supported in Velox, because it's ambiguous. And it is not recommended by Spark (see link).

That makes sense now. Thank you for directing me to this code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants