-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43967: [C++] Enhance error message for URI parsing #43938
GH-43967: [C++] Enhance error message for URI parsing #43938
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this fix @CrystalZhou0529 !
From a quick look the change in cpp/src/arrow/util/uri.cc LGTM. I do have some questions/suggestions for the tests.
First: could we also add a test on the Python side to check it works correctly there also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a neat improvement, thanks @CrystalZhou0529 .
I think the problem is that PyArrow swallows the URI error and falls back to the local filesystem (which is often legitimate, but probably not if the user really meant to pass a S3 URI :-)). |
Yeah, the content in the issue should have given me this explanation already. Thanks again ;) The Java failure is something I noticed on other PRs also if I am not mistaken? |
Yes, it's quite recurring. I'm not sure there's an issue open for it? |
I think this might be the open issue connected to the Java failures: apache/arrow-java#63 Will merge this PR as the CI is green. |
@AlenkaF yes this is already tracked and I checked the CIs too. And not relevant to this PR. |
|
Given that this PR is not solving the original issue (I do think we can actually fix that by being more strict in when we swallow that error, and actually bubble it up to the user more often), I opened a separate issue focused on improving the C++ error message and linked this PR to that: #43967 |
|
Thanks @CrystalZhou0529! |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 9336bbe. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 22 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…3938) ### Rationale for this change We want to enhance error message for URI parsing error to provide more information for the syntax error scenario. When error message is generated from `uriParseSingleUriExA`, the return value might indicate a `URI_ERROR_SYNTAX` error, and `error_pos` would be set to the position causing syntax error. ([uriparser/Uri.h](https://github.com/apache/arrow/blob/c455d6b8c4ae2cb22baceb4c27e1325b973d39e1/cpp/src/arrow/vendored/uriparser/Uri.h#L288)) In the new error message, it includes the character causing syntax error and its position, so users can have a better idea why the error happens. ### What changes are included in this PR? - Error message change in URI parsing function. ### Are these changes tested? PR includes unit tests. ### Are there any user-facing changes? Yes, but only for error message. * GitHub Issue: apache#41365 * GitHub Issue: apache#43967 Authored-by: Crystal Zhou <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…3938) ### Rationale for this change We want to enhance error message for URI parsing error to provide more information for the syntax error scenario. When error message is generated from `uriParseSingleUriExA`, the return value might indicate a `URI_ERROR_SYNTAX` error, and `error_pos` would be set to the position causing syntax error. ([uriparser/Uri.h](https://github.com/apache/arrow/blob/c455d6b8c4ae2cb22baceb4c27e1325b973d39e1/cpp/src/arrow/vendored/uriparser/Uri.h#L288)) In the new error message, it includes the character causing syntax error and its position, so users can have a better idea why the error happens. ### What changes are included in this PR? - Error message change in URI parsing function. ### Are these changes tested? PR includes unit tests. ### Are there any user-facing changes? Yes, but only for error message. * GitHub Issue: apache#41365 * GitHub Issue: apache#43967 Authored-by: Crystal Zhou <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
|
Rationale for this change
We want to enhance error message for URI parsing error to provide more information for the syntax error scenario.
When error message is generated from
uriParseSingleUriExA
, the return value might indicate aURI_ERROR_SYNTAX
error, anderror_pos
would be set to the position causing syntax error. (uriparser/Uri.h)In the new error message, it includes the character causing syntax error and its position, so users can have a better idea why the error happens.
What changes are included in this PR?
Are these changes tested?
PR includes unit tests.
Are there any user-facing changes?
Yes, but only for error message.