-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improve][Zeta] Disable hdfs filesystem cache of checkpoint #6718
Conversation
…cache function is disabled by default.
@@ -186,3 +186,40 @@ seatunnel: | |||
|
|||
``` | |||
|
|||
### Enable cache | |||
|
|||
When storage:type is hdfs, cache is disabled by default. If you want to enable it, set `disable.cache: false` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share what risks will be caused if cache is turned on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share what risks will be caused if cache is turned on?
I think in the scenario of seatunel, it should not be turned on at any time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share what risks will be caused if cache is turned on?
You can take a look at the bug #6678 I proposed, which is [S3 connector], which causes occasional task failures due to the use of cache. at the same time, you can also refer to these two issues, which are bug: https://issues.apache.org/jira/browse/HADOOP-15819 and aws/aws-sdk-java#2337 about hadoop-aws and aws-sdk-java. The problems caused by turning on cache in multithreaded environment are described in detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense to me.
Could you add a test case to cover this bug? |
Currently, there is no bug for connection closure in hdfs cache, so it is a precaution in advance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…cache function is disabled by default.
Purpose of this pull request
Does this PR introduce any user-facing change?
When using
hadoop-aws-3.1.4.jar
andaws-java-sdk-bundle-1.11.271.jar
to connect hdfs or s3 file systems, the default mode is cache. In multithreaded scenarios, FileSysyem objects are often closed, resulting in the closure of the connection pool. If the objects are taken from the cache, some unknown exceptions will be causedHow was this patch tested?
no
Check list
New License Guide
release-note
.