[controller] AdminConsumptionTask fix around timed out or cancelled AdminExecutionTasks #1444
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[controller] AdminConsumptionTask fix around timed out or cancelled AdminExecutionTasks
This is based on Felix's commit with some additional changes to add an unit test for timed out or cancelled AdminExecutionTask to verify the fix. The original commit message is below:
The main bug fix in this commit is that in the AdminConsumptionTask's executeMessagesAndCollectResults function, tasks for many stores could be scheduled, but if some of them never began running, and the whole set of tasks timed out (after 30 minutes), then the tasks that didn't start at all would leave a lingering reference to the corresponding store in the storeToScheduledTask map, which is a class property. This in turn would cause the next execution of this function to skip the affected store, therefore never letting it run any task again. Since it is not actually needed to hang on this map at the class level, the fix is to change this to a local variable, which therefore does not carry any consequences across executions of the function.
In addition, several NPEs (or potential ones, anyway), are also fixed.
Finally, there are some cosmetic changes such as deleting unused code, and making some fields final.
How was this PR tested?
Existing and new unit test
Does this PR introduce any user-facing changes?