-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Support reset_prefix_cache
#12284
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
0c36b1c
to
714423f
Compare
Signed-off-by: Cody Yu <[email protected]>
@@ -519,6 +519,17 @@ async def create_score_v1(request: ScoreRequest, raw_request: Request): | |||
} | |||
|
|||
|
|||
@router.post("/reset_prefix_cache") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to have a flag so that this endpoint is opt-in, users should not send this request.
we can add an env var VLLM_DEV_MODE
, and only expose this endpoint when VLLM_DEV_MODE
is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can just add an argument when launching an endpoint to allow this route. I supposed most production API server doesn't want to this be enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the great work! I didn't imagine this is so involved (it touches quite a lot files). I think we should do something to refactor the way we create a new endpoint, cc @robertgshaw2-redhat
This PR adds the capability of resetting prefix cache for both v0 and v1. This feature could be useful for the following cases:
The prefix cache can only be reset when all blocks are free. However, although this PR also provide
/reset_prefix_cache
in API server, we currently always return200
whether the prefix cache is reset successfully or not. This is mainly because it's not trivial for engine client and server to communicate such information (no fundamental burden but just need some work).cc @youkaichao @robertgshaw2-redhat