[Core] Prefill Only Tokens Without KV Cache in Batch Requests (Disagg Prefill) #12285
+223
−43
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This ticket is part of [RFC]: Disaggregated Prefilling and KV Cache Transfer Roadmap #10818, with the following issue: "Adaptivity and Fault Tolerance: [Perf] If not all KV caches in the batch are received, only perform prefilling on those tokens without KV cache."
When there are multiple requests in a batch, the decode node may only receive the KV cache for some of them from the prefill node. Previously, in such cases, the decode node would perform prefilling for all requests in the batch, even if it had already received the KV cache for some requests.
This PR rebuilds the model input when the KV cache for some requests in the batch is missing, thereby preventing prefilling on those requests.