-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43860: [Go][Parquet] Handle the error correctly #43861
Conversation
Signed-off-by: bigsheeper <[email protected]>
|
Would you mind add a test for this? I guess parquet-testing/bad-data might contain some bad file? |
@mapleFU Sorry, could you clarify what you mean by "parquet-testing/bad-file"? Can you provide a link or more context? |
You can try to follow[1]. And parquet-testing uses this repo [2]. It includes some bad data-files in "bad_data", would you mind check can this being tested using these files? [1] arrow/go/parquet/file/file_reader_test.go Lines 397 to 401 in 6b268f6
[2] https://github.com/apache/parquet-testing |
@mapleFU It seems that the |
Sorry, would you mind try something like below? std::string get_bad_data_dir() {
// PARQUET_TEST_DATA should point to ARROW_HOME/cpp/submodules/parquet-testing/data
// so need to reach one folder up to access the "bad_data" folder.
std::string data_dir(get_data_dir());
std::stringstream ss;
ss << data_dir << "/../bad_data";
return ss.str();
} If cannot touch data, can we handle write a test ourselve using go? |
I agree. The root cause of this issue is that the error was not thrown. Maybe we can just use one mock error in ut? If that's ok, I'll add a ut for this change. |
Both is ok for me. I'll be glad to see a go ut, if bad_data can reproduce it would be convient. I'm ok for both |
Signed-off-by: bigsheeper <[email protected]>
@mapleFU I attempted to read a bad file, but no error was returned. In my case, the io.reader reported an error (specifically, a timeout connecting to S3: Get "http://x.x.x.x:9000/a-bucket//tmp/test_2292796098596731149.parquet": dial tcp x.x.x.x:9000: connect: connection refused), but the Parquet reader did not propagate this error. I'v simulate this scenario in the ut. Please help to review. :) |
Dial a s3 address in parquet ut would be a disaster and be unstable, I don't suggest that Mocking might be ok here |
I didn’t dial an S3 address in Parquet ut; I only mock an error when |
Signed-off-by: bigsheeper <[email protected]>
I see. A lot of module under Parquet-go might need handle |
If you're interested in the issue we've encountered, see: milvus-io/milvus#35662 (comment) :) |
@mapleFU Several areas of the Go parquet lib rely on using panics with a top-level recover to return errors. I agree we should definitely try to find any spots where we're missing proper handling. Go has support for fuzz testing built in https://go.dev/doc/security/fuzz/ and I've been meaning to get around to adding that to the parquet lib but haven't had the time. It would be great for us to add that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks!
If there are any other areas we can identify that have incorrect error handling we should try to find them, test, and update them.
LGTM |
@bigsheeper FYI #43607 these are similiar problem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 9d40a6a. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Signed-off-by: bigsheeper <[email protected]>
apacheGH-43860: [Go][Parquet] Handle the error correctly (apache#43861)
### Rationale for this change Fixes: apache#43860 ### What changes are included in this PR? Return error correctly ### Are these changes tested? Yes ### Are there any user-facing changes? Nope * GitHub Issue: apache#43860 Authored-by: bigsheeper <[email protected]> Signed-off-by: Matt Topol <[email protected]>
### Rationale for this change Fixes: apache#43860 ### What changes are included in this PR? Return error correctly ### Are these changes tested? Yes ### Are there any user-facing changes? Nope * GitHub Issue: apache#43860 Authored-by: bigsheeper <[email protected]> Signed-off-by: Matt Topol <[email protected]>
Rationale for this change
Fixes: #43860
What changes are included in this PR?
Return error correctly
Are these changes tested?
Yes
Are there any user-facing changes?
Nope