-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add load generation config from model #11164
[Feature] Add load generation config from model #11164
Conversation
Signed-off-by: liuyanyi <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: liuyanyi <[email protected]>
Signed-off-by: liuyanyi <[email protected]>
if any(p in generation_config_fields for p in available_params): | ||
overwrite_config = { | ||
p: generation_config_fields.get(p, None) | ||
for p in available_params | ||
} | ||
logger.info("Overwriting generation config with: %s", | ||
overwrite_config) | ||
# Modify the ChatCompletionRequest to include the generation config | ||
for k, v in overwrite_config.items(): | ||
if v is not None: | ||
ChatCompletionRequest.model_fields[k].default = v | ||
CompletionRequest.model_fields[k].default = v | ||
|
||
# Rebuild the models to include the new fields | ||
ChatCompletionRequest.model_rebuild(force=True) | ||
CompletionRequest.model_rebuild(force=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit hacky, can we move the default sampling params generation to the serving engine class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tricky way is used for fastapi, ChatCompletionRequest and CompletionRequest should be modified before add_api_route. Without model rebuild, the server will still use the old param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that we move the default value generation outside of the pydantic model, into the serving engine class.
Signed-off-by: Yanyi Liu <[email protected]>
Signed-off-by: Yanyi Liu <[email protected]>
Signed-off-by: Yanyi Liu <[email protected]>
Signed-off-by: Yanyi Liu <[email protected]>
6b02f5d
to
ef436b7
Compare
Signed-off-by: Yanyi Liu <[email protected]>
Signed-off-by: Yanyi Liu <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Yanyi Liu <[email protected]>
Signed-off-by: Yanyi Liu <[email protected]>
Signed-off-by: liuyanyi <[email protected]>
@DarkLight1337 Hi, i've revise the code. |
vllm/entrypoints/openai/protocol.py
Outdated
temperature = self.temperature or default_sampling_params.get( | ||
"temperature", 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default here should be 1.0, see #11219
Signed-off-by: liuyanyi <[email protected]>
Signed-off-by: liuyanyi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for implementing this!
Signed-off-by: liuyanyi <[email protected]>
Head branch was pushed to by a user without write access
https://buildkite.com/vllm/ci/builds/10739#0193cef2-4fc6-4ad9-82cc-1b4bbc85d544/6-10322 This test seems to be failing ever since the default temperature was set to 1.0. Can you set |
available_params = [ | ||
"repetition_penalty", | ||
"temperature", | ||
"top_k", | ||
"top_p", | ||
"min_p", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How were these params chosen? I also see token ids specified in some generation configs
https://huggingface.co/microsoft/Phi-3.5-mini-instruct/blob/main/generation_config.json
https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/generation_config.json
https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/generation_config.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vllm/vllm/engine/llm_engine.py
Line 850 in 0708124
sampling_params.update_from_generation_config( |
I think token_ids has been used in llm_engine
vllm/entrypoints/openai/protocol.py
Outdated
temperature = self.temperature if self.temperature is not None else 0.0 | ||
|
||
if (temperature := self.temperature) is None: | ||
temperature = default_sampling_params.get("temperature", 1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am weary of defining default values for parameters in multiple places. Already here and below you set temperature = 1.0
in four separate places. At the very least we should have a global static default_sampling_params that can be referenced for each parameter name, so if we change the default it just needs to be one place.
temperature: Optional[float] = 1.0 | ||
top_p: Optional[float] = 1.0 | ||
temperature: Optional[float] = None | ||
top_p: Optional[float] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why you want these to be None by default, but I think it is most clear to have the default value here. I won't block based on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a default value is set, fastapi will use the default value diretly, i guess a comment is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't set a default value here because it depends on generation config. The best we can do is what you suggested in the previous comment.
… requests Signed-off-by: liuyanyi <[email protected]>
Head branch was pushed to by a user without write access
I think the default temperature 1.0 lead to this fail, pr #11219 failed too. i set the temperature to 0.0 in this test. |
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: liuyanyi <[email protected]> Signed-off-by: Yanyi Liu <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: liuyanyi <[email protected]> Signed-off-by: Yanyi Liu <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: liuyanyi <[email protected]> Signed-off-by: Yanyi Liu <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Desc
Add
generation_config
to engine_arg, allow load and overwrite the default sampling param.To avoid braking change, this flag is defaults to None, preserving the existing behavior unless explicitly configured.
Here are the three mode of
generation_config
None
: keep the same behavior in previous versionauto
: Load the default generation_config into sampling_param.GenerationConfig.from_pretrained(path)
Key Changes:
_load_generation_config_dict
in llm_engine intoModelConfig
generation_config
arg to engine_args and vllm_configget_default_sampling_params
interface forLLM
class.FIX #10758 (link existing issues this PR will resolve)