-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT]Downstream Fails to Read Hudi MOR Table and Write to Kudu Table When Using Bucketed Table, with Warning: reader has fallen behind too much from the writer #12585
Comments
This log indicates the streaming reader has lag too much, probably there could be two reasons:
|
Thank you for your response! We understand the potential reasons for the streaming read task lag: Slow Kudu Write Speed: Full Table Scan Due to Early Read Start Commit: Further Questions Historical Data Reading: Kudu Write Performance: Summary How to optimize read performance for bucketed tables. How to read historical data without a full table scan. Are there other configurations or tools to help troubleshoot? Thank you for your help! |
For bucketed table are you referring to the bucket index of MOR table? One fact to know is that the writer would write pure avro logs at first so the streaming reader would also read these logs. For streaming read we have an option value named "earliest" for the It looks like the waning log is normal because of the explicit specified read start commit, this log shows there when the commit to read has already been archived. |
Thank you for your help! We followed your suggestion and used the following configuration: options.put(FlinkOptions.READ_START_COMMIT.key(), "earliest"); This configuration indeed resolved the read task lag issue, and we were able to read the Hudi table and write to the Kudu table normally. However, the task stopped after running for a while and threw the following error: org.apache.flink.runtime.executiongraph.ExecutionGraph [] - split_reader -> Sink:Unnamed(1/1) switched from INITIALIZING to FAILED on container_e30_xxx org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job switched from state RUNNING to FAILED org.apache.flink.runtime.JobException: Recovery is suppressed by FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=3, backoffTimeMs=60000) Caused by: org.apache.hudi.exception.HoodieException: Get reader error for path: hdfs://nameservice1:xxx.parquet We tried to skip files by using the following configurations: options.put("read.streaming.skip_clustering", "true"); Clean Policy: options.put("hoodie.clean.automatic", "true"); But the issue persists. |
@danny0405 Is there any way to skip the Parquet files that are currently being merged? Looking forward to your response. |
All the data files being written could never be seen by the reader, this is the Snapshot Isolation contract of a table format.
Do you have more detailed exceptions, I can not get the cues from these errors, they all look like Flink related errors which may be caused by resource starvation. |
@danny0405 The error is as follows: org.apache.flink.runtime.executiongraph.ExecutionGraph [] - split_reader -> Sink:Unnamed(1/1) switched from INITIALIZING to FAILED on container_e30_xxx org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job switched from state RUNNING to FAILED org.apache.flink.runtime.JobException: Recovery is suppressed by FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=3, backoffTimeMs=60000) |
@danny0405 options.put("read.streaming.skip_clustering", "true"); |
I still can not get the clues with the error you pasted, why you thought the skipping does not take effect then, if you read from the earliest, the reader would trigger a full table scan first which would include the parquet files. I see you set up 8 slots in one TM, can you set up 1 slot for 1 TM and add more resource to the reader then? |
@danny0405
Problem Analysis These files may not have completed writing or merging, resulting in a file size of 0 MB or invalid, causing read failures. We would like to know:
Looking forward to your reply! |
We already have a valid check for the incremental read files: Line 132 in ef9ab24
|
@danny0405 options.put("read.streaming.skip_clustering", "true"); After commenting these out, the process runs normally for a period of time. However, if the upstream completely overwrites the parquet files, it still throws an error saying that the parquet files cannot be found. Here are my current configuration parameters: Map<String, String> options = new HashMap<>(); The specific error message is as follows: I look forward to your reply.Let me know if you need further adjustments! |
@pengxianzi Are you referring to the INSERT_OVERWRITE operation? |
@danny0405 Upstream I write using options.put (" write. operation ", " upsert "); this pattern |
All the regular write parquet files should be read if they are committed successfully. |
We are using Apache Hudi to build a data lake and writing data to a Kudu table downstream. The following two scenarios exhibit different behaviors:
Scenario 1: The upstream writes data using a regular MOR (Merge On Read) Hudi table, and the downstream reads the Hudi table and writes to the Kudu table without any issues.
Scenario 2: The upstream writes data using a bucketed table, and when the downstream reads the Hudi table and attempts to write to the Kudu table, the task fails with the following warning:
caution: the reader has fallen behind too much from the writer, tweak 'read.tasks' option to add parallelism of read tasks
We have tried setting the read.tasks parameter to 10, but the issue persists. Below are our configuration and environment details:
Hudi version : 0.14.0
Spark version : 2.4.7
Hive version : 3.1.3
Hadoop version : 3.1.1
Storage Format: HDFS
Downstream Storage: Apache Kudu
Bucketed Table Configuration: Number of buckets is 10
Configuration Information
Below is our Hudi table configuration:
Steps to Reproduce
Attempted Solutions
Expected Behavior
The downstream should be able to read the Hudi MOR table written by the bucketed table and write the data to the Kudu table normally.
Actual Behavior
The downstream read task fails with the warning: reader has fallen behind too much from the writer.
Log Snippet
Below is a snippet of the log when the task fails:
Caused by:org.apache.flink.runtime.resourcemanager.exceptions.UnknownTaskExecutorException:No TaskExecutor registered under container e38 1734494154374 0718 01 000002
caused by :org.apache.flink.util.FlinkRuntimeException:Exceeded checkpoint tolerable failure threshould
Caused by: java.util.concurrent.TimeoutException
caution :the reader has fall behind too much from the write , tweak 'read.tasks' option to add parallelism of read tasks
ERROR org.apache.flink.runtime.taseManagerRunner [] - Fatal error occurred while executing the TaskManager. Shutting it down ...
org.apache.flink.util.FlinkException: the TaskExecutor's registration at the ResourceManager akka.tcp://node1:7791/user/rpc/resourcemanager_0 has been rejected :Rejected TaskExecutor registration at the ResourceManager because:The ResourceManager does not recognize this TaskExecutor
Summary of Questions
We would like to know:
The text was updated successfully, but these errors were encountered: