Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] fix ingestion hang because of alter job timeout #55207

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

luohaha
Copy link
Contributor

@luohaha luohaha commented Jan 17, 2025

Why I'm doing:

In shared-data cluster, alter job will increase partition's next version and then change state to FINISHED_REWRITING. But if this alter job is timeout, it won't be executed and can't finish, it will lead to version gap and cause issue like:

2025-01-13 05:13:26.728Z ERROR (lake-publish-task-160840|200302) [PublishVersionDaemon.publishPartitionBatch():488] publish partition batch partition.getVisibleVersion() + 1 != version.get(0) 4473684 37071 37073

What I'm doing:

Two changes in my PR:

  1. Only skip the alter job which can be cancelled.
  2. Fix timeout setting when create LakeTableAlterMetaJob.

This pull request includes changes to improve the handling of job cancellation and timeouts in the AlterJobV2 class, as well as a minor adjustment to the timeout parameter in the SchemaChangeHandler class. The most important changes are summarized below:

Improvements to job cancellation handling:

Adjustments to timeout parameter:

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

@luohaha luohaha requested a review from a team as a code owner January 17, 2025 08:18
Signed-off-by: luohaha <[email protected]>
wyb
wyb previously approved these changes Jan 17, 2025
Signed-off-by: luohaha <[email protected]>
Copy link

[FE Incremental Coverage Report]

fail : 4 / 6 (66.67%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/alter/RollupJobV2.java 0 1 00.00% [773]
🔵 com/starrocks/alter/AlterJobV2.java 4 5 80.00% [272]

Signed-off-by: luohaha <[email protected]>
Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants