Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43967: [C++] Enhance error message for URI parsing #43938

Merged
merged 2 commits into from
Sep 5, 2024

Conversation

CrystalZhou0529
Copy link
Contributor

@CrystalZhou0529 CrystalZhou0529 commented Sep 3, 2024

Rationale for this change

We want to enhance error message for URI parsing error to provide more information for the syntax error scenario.

When error message is generated from uriParseSingleUriExA, the return value might indicate a URI_ERROR_SYNTAX error, and error_pos would be set to the position causing syntax error. (uriparser/Uri.h)

In the new error message, it includes the character causing syntax error and its position, so users can have a better idea why the error happens.

What changes are included in this PR?

  • Error message change in URI parsing function.

Are these changes tested?

PR includes unit tests.

Are there any user-facing changes?

Yes, but only for error message.

Copy link

github-actions bot commented Sep 3, 2024

⚠️ GitHub issue #41365 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this fix @CrystalZhou0529 !

From a quick look the change in cpp/src/arrow/util/uri.cc LGTM. I do have some questions/suggestions for the tests.

First: could we also add a test on the Python side to check it works correctly there also?

cpp/src/arrow/filesystem/filesystem_test.cc Show resolved Hide resolved
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 4, 2024
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a neat improvement, thanks @CrystalZhou0529 .

@pitrou
Copy link
Member

pitrou commented Sep 4, 2024

@AlenkaF

First: could we also add a test on the Python side to check it works correctly there also?

I think the problem is that PyArrow swallows the URI error and falls back to the local filesystem (which is often legitimate, but probably not if the user really meant to pass a S3 URI :-)).

@AlenkaF
Copy link
Member

AlenkaF commented Sep 4, 2024

I think the problem is that PyArrow swallows the URI error and falls back to the local filesystem (which is often legitimate, but probably not if the user really meant to pass a S3 URI :-)).

Yeah, the content in the issue should have given me this explanation already. Thanks again ;)

The Java failure is something I noticed on other PRs also if I am not mistaken?

@pitrou
Copy link
Member

pitrou commented Sep 4, 2024

The Java failure is something I noticed on other PRs also if I am not mistaken?

Yes, it's quite recurring. I'm not sure there's an issue open for it?

@AlenkaF
Copy link
Member

AlenkaF commented Sep 5, 2024

I think this might be the open issue connected to the Java failures: apache/arrow-java#63

Will merge this PR as the CI is green.

@vibhatha
Copy link
Collaborator

vibhatha commented Sep 5, 2024

@AlenkaF yes this is already tracked and I checked the CIs too. And not relevant to this PR.

@jorisvandenbossche jorisvandenbossche changed the title GH-41365: [C++] Enhance error message for URI parsing GH-43967: [C++] Enhance error message for URI parsing Sep 5, 2024
Copy link

github-actions bot commented Sep 5, 2024

⚠️ GitHub issue #43967 has been automatically assigned in GitHub to PR creator.

@jorisvandenbossche
Copy link
Member

First: could we also add a test on the Python side to check it works correctly there also?

I think the problem is that PyArrow swallows the URI error and falls back to the local filesystem (which is often legitimate, but probably not if the user really meant to pass a S3 URI :-)).

Given that this PR is not solving the original issue (I do think we can actually fix that by being more strict in when we swallow that error, and actually bubble it up to the user more often), I opened a separate issue focused on improving the C++ error message and linked this PR to that: #43967

Copy link

github-actions bot commented Sep 5, 2024

⚠️ GitHub issue #43967 has been automatically assigned in GitHub to PR creator.

@jorisvandenbossche jorisvandenbossche merged commit 9336bbe into apache:main Sep 5, 2024
46 of 47 checks passed
@jorisvandenbossche jorisvandenbossche removed the awaiting committer review Awaiting committer review label Sep 5, 2024
@jorisvandenbossche
Copy link
Member

Thanks @CrystalZhou0529!

Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 9336bbe.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 22 possible false positives for unstable benchmarks that are known to sometimes produce them.

zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Sep 6, 2024
…3938)

### Rationale for this change

We want to enhance error message for URI parsing error to provide more information for the syntax error scenario.

When error message is generated from `uriParseSingleUriExA`, the return value might indicate a `URI_ERROR_SYNTAX` error, and `error_pos` would be set to the position causing syntax error. ([uriparser/Uri.h](https://github.com/apache/arrow/blob/c455d6b8c4ae2cb22baceb4c27e1325b973d39e1/cpp/src/arrow/vendored/uriparser/Uri.h#L288))

In the new error message, it includes the character causing syntax error and its position, so users can have a better idea why the error happens.

### What changes are included in this PR?

- Error message change in URI parsing function.

### Are these changes tested?

PR includes unit tests.

### Are there any user-facing changes?

Yes, but only for error message.

* GitHub Issue: apache#41365
* GitHub Issue: apache#43967

Authored-by: Crystal Zhou <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
khwilson pushed a commit to khwilson/arrow that referenced this pull request Sep 14, 2024
…3938)

### Rationale for this change

We want to enhance error message for URI parsing error to provide more information for the syntax error scenario.

When error message is generated from `uriParseSingleUriExA`, the return value might indicate a `URI_ERROR_SYNTAX` error, and `error_pos` would be set to the position causing syntax error. ([uriparser/Uri.h](https://github.com/apache/arrow/blob/c455d6b8c4ae2cb22baceb4c27e1325b973d39e1/cpp/src/arrow/vendored/uriparser/Uri.h#L288))

In the new error message, it includes the character causing syntax error and its position, so users can have a better idea why the error happens.

### What changes are included in this PR?

- Error message change in URI parsing function.

### Are these changes tested?

PR includes unit tests.

### Are there any user-facing changes?

Yes, but only for error message.

* GitHub Issue: apache#41365
* GitHub Issue: apache#43967

Authored-by: Crystal Zhou <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
Copy link

⚠️ GitHub issue #43967 has been automatically assigned in GitHub to PR creator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants