[Core] Prefill Only Tokens Without KV Cache in Batch Requests (Disagg Prefill) #12285

Shaoting-Feng · 2025-01-21T23:20:02Z

This ticket is part of [RFC]: Disaggregated Prefilling and KV Cache Transfer Roadmap #10818, with the following issue: "Adaptivity and Fault Tolerance: [Perf] If not all KV caches in the batch are received, only perform prefilling on those tokens without KV cache."

When there are multiple requests in a batch, the decode node may only receive the KV cache for some of them from the prefill node. Previously, in such cases, the decode node would perform prefilling for all requests in the batch, even if it had already received the KV cache for some requests.

This PR rebuilds the model input when the KV cache for some requests in the batch is missing, thereby preventing prefilling on those requests.

Signed-off-by: Shaoting Feng <[email protected]>

github-actions · 2025-01-21T23:20:13Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Only perform prefiling on requests without KV cache in a batch

2b78a04

Signed-off-by: Shaoting Feng <[email protected]>

Shaoting-Feng changed the title ~~[Core] Perform Prefilling Only on Decode Node for Requests Without KV Cache from Prefill Node in a Batch~~ [Core] Prefill Only Tokens Without KV Cache in Batch Requests (Disagg Prefill) Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Prefill Only Tokens Without KV Cache in Batch Requests (Disagg Prefill) #12285

[Core] Prefill Only Tokens Without KV Cache in Batch Requests (Disagg Prefill) #12285

Shaoting-Feng commented Jan 21, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 21, 2025

[Core] Prefill Only Tokens Without KV Cache in Batch Requests (Disagg Prefill) #12285

Are you sure you want to change the base?

[Core] Prefill Only Tokens Without KV Cache in Batch Requests (Disagg Prefill) #12285

Conversation

Shaoting-Feng commented Jan 21, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 21, 2025

Shaoting-Feng commented Jan 21, 2025 •

edited by github-actions bot

Loading