fix: reuse batch response max timestamp immediately #905
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This change will immediately use the batch response's max timestamp in the lambda aggregation to exclude streaming rows that are before the timestamp, without waiting for groupBy serving info to get refreshed.
Only the uncached flow is updated for now, because the (cached) batchIr objects currently don't include the timestamp, and change to this is left out of scope.
Why / Goal
When batch response's latest timestamp is later than GroupByServingInfo's end ts (which indicates staleness of GroupByServingInfo's staleness), the TTL cache is updated asynchronously, and occasionally can fail. In the meantime, the initial request will keep using stale end ts, causing the aggregation results to be wrong as extra streaming rows that should've been excluded is included.
Test Plan
wip
Checklist
Reviewers
@pengyu-hou @yuli-han