Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[controller] AdminConsumptionTask fix around timed out or cancelled AdminExecutionTasks #1444

Merged
merged 1 commit into from
Jan 16, 2025

Conversation

xunyin8
Copy link
Contributor

@xunyin8 xunyin8 commented Jan 16, 2025

[controller] AdminConsumptionTask fix around timed out or cancelled AdminExecutionTasks

This is based on Felix's commit with some additional changes to add an unit test for timed out or cancelled AdminExecutionTask to verify the fix. The original commit message is below:

The main bug fix in this commit is that in the AdminConsumptionTask's executeMessagesAndCollectResults function, tasks for many stores could be scheduled, but if some of them never began running, and the whole set of tasks timed out (after 30 minutes), then the tasks that didn't start at all would leave a lingering reference to the corresponding store in the storeToScheduledTask map, which is a class property. This in turn would cause the next execution of this function to skip the affected store, therefore never letting it run any task again. Since it is not actually needed to hang on this map at the class level, the fix is to change this to a local variable, which therefore does not carry any consequences across executions of the function.

In addition, several NPEs (or potential ones, anyway), are also fixed.

Finally, there are some cosmetic changes such as deleting unused code, and making some fields final.

How was this PR tested?

Existing and new unit test

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.

…dminExecutionTasks

This is based on Felix's commit with some additional changes to add an unit test for
timed out or cancelled AdminExecutionTask to verify the fix. The original commit message
is below:

The main bug fix in this commit is that in the AdminConsumptionTask's
executeMessagesAndCollectResults function, tasks for many stores could
be scheduled, but if some of them never began running, and the whole
set of tasks timed out (after 30 minutes), then the tasks that didn't
start at all would leave a lingering reference to the corresponding
store in the storeToScheduledTask map, which is a class property. This
in turn would cause the next execution of this function to skip the
affected store, therefore never letting it run any task again. Since
it is not actually needed to hang on this map at the class level, the
fix is to change this to a local variable, which therefore does not
carry any consequences across executions of the function.

In addition, several NPEs (or potential ones, anyway), are also fixed.

Finally, there are some cosmetic changes such as deleting unused code,
and making some fields final.
Copy link
Contributor

@FelixGV FelixGV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot Xun!

@xunyin8 xunyin8 merged commit 1261bb9 into linkedin:main Jan 16, 2025
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants