Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add load generation config from model #11164

Merged
merged 20 commits into from
Dec 19, 2024

Conversation

liuyanyi
Copy link
Contributor

@liuyanyi liuyanyi commented Dec 13, 2024

Desc

Add generation_config to engine_arg, allow load and overwrite the default sampling param.

To avoid braking change, this flag is defaults to None, preserving the existing behavior unless explicitly configured.

Here are the three mode of generation_config

  1. None: keep the same behavior in previous version
  2. auto: Load the default generation_config into sampling_param.
  3. A folder store the generation_config: A custom folder to load generation config, works like transformers GenerationConfig.from_pretrained(path)

Key Changes:

  • merge _load_generation_config_dict in llm_engine into ModelConfig
  • Add generation_config arg to engine_args and vllm_config
  • Add get_default_sampling_params interface for LLM class.
  • Support for OpenAI server and swagger doc can be auto reflect the changes

FIX #10758 (link existing issues this PR will resolve)

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the frontend label Dec 13, 2024
Signed-off-by: liuyanyi <[email protected]>
Signed-off-by: liuyanyi <[email protected]>
vllm/engine/arg_utils.py Outdated Show resolved Hide resolved
vllm/entrypoints/llm.py Outdated Show resolved Hide resolved
Comment on lines 583 to 598
if any(p in generation_config_fields for p in available_params):
overwrite_config = {
p: generation_config_fields.get(p, None)
for p in available_params
}
logger.info("Overwriting generation config with: %s",
overwrite_config)
# Modify the ChatCompletionRequest to include the generation config
for k, v in overwrite_config.items():
if v is not None:
ChatCompletionRequest.model_fields[k].default = v
CompletionRequest.model_fields[k].default = v

# Rebuild the models to include the new fields
ChatCompletionRequest.model_rebuild(force=True)
CompletionRequest.model_rebuild(force=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit hacky, can we move the default sampling params generation to the serving engine class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tricky way is used for fastapi, ChatCompletionRequest and CompletionRequest should be modified before add_api_route. Without model rebuild, the server will still use the old param.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that we move the default value generation outside of the pydantic model, into the serving engine class.

Signed-off-by: Yanyi Liu <[email protected]>
Signed-off-by: Yanyi Liu <[email protected]>
@liuyanyi liuyanyi force-pushed the add_generation_config branch from 6b02f5d to ef436b7 Compare December 13, 2024 17:10
Signed-off-by: Yanyi Liu <[email protected]>
@liuyanyi liuyanyi requested a review from simon-mo as a code owner December 14, 2024 03:21
Copy link

mergify bot commented Dec 15, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @liuyanyi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 15, 2024
@mergify mergify bot removed the needs-rebase label Dec 15, 2024
@liuyanyi
Copy link
Contributor Author

@DarkLight1337 Hi, i've revise the code.
Now, entrypoints can read default sampling params from ModelConfig.get_default_sampling_params. For OpenAI server, the process logic has moved to ServingChat and ServingCompletion, no more pydantic mocking.

Comment on lines 364 to 365
temperature = self.temperature or default_sampling_params.get(
"temperature", 0.0)
Copy link
Member

@DarkLight1337 DarkLight1337 Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default here should be 1.0, see #11219

Signed-off-by: liuyanyi <[email protected]>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for implementing this!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 16, 2024 03:18
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 16, 2024
Signed-off-by: liuyanyi <[email protected]>
auto-merge was automatically disabled December 16, 2024 03:23

Head branch was pushed to by a user without write access

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 16, 2024 03:25
@DarkLight1337
Copy link
Member

https://buildkite.com/vllm/ci/builds/10739#0193cef2-4fc6-4ad9-82cc-1b4bbc85d544/6-10322

This test seems to be failing ever since the default temperature was set to 1.0. Can you set temperature = 0 for this?

Comment on lines +829 to +835
available_params = [
"repetition_penalty",
"temperature",
"top_k",
"top_p",
"min_p",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sampling_params.update_from_generation_config(

I think token_ids has been used in llm_engine

temperature = self.temperature if self.temperature is not None else 0.0

if (temperature := self.temperature) is None:
temperature = default_sampling_params.get("temperature", 1.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am weary of defining default values for parameters in multiple places. Already here and below you set temperature = 1.0 in four separate places. At the very least we should have a global static default_sampling_params that can be referenced for each parameter name, so if we change the default it just needs to be one place.

Comment on lines -214 to +215
temperature: Optional[float] = 1.0
top_p: Optional[float] = 1.0
temperature: Optional[float] = None
top_p: Optional[float] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you want these to be None by default, but I think it is most clear to have the default value here. I won't block based on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a default value is set, fastapi will use the default value diretly, i guess a comment is enough?

Copy link
Member

@DarkLight1337 DarkLight1337 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't set a default value here because it depends on generation config. The best we can do is what you suggested in the previous comment.

auto-merge was automatically disabled December 17, 2024 03:23

Head branch was pushed to by a user without write access

@liuyanyi
Copy link
Contributor Author

https://buildkite.com/vllm/ci/builds/10739#0193cef2-4fc6-4ad9-82cc-1b4bbc85d544/6-10322

This test seems to be failing ever since the default temperature was set to 1.0. Can you set temperature = 0 for this?

I think the default temperature 1.0 lead to this fail, pr #11219 failed too. i set the temperature to 0.0 in this test.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 19, 2024 07:45
@DarkLight1337 DarkLight1337 merged commit 5aef498 into vllm-project:main Dec 19, 2024
56 checks passed
BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024
Signed-off-by: liuyanyi <[email protected]>
Signed-off-by: Yanyi Liu <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
joennlae pushed a commit to 44ai-labs/vllm that referenced this pull request Jan 19, 2025
Signed-off-by: liuyanyi <[email protected]>
Signed-off-by: Yanyi Liu <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
joennlae pushed a commit to 44ai-labs/vllm that referenced this pull request Jan 19, 2025
Signed-off-by: liuyanyi <[email protected]>
Signed-off-by: Yanyi Liu <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: ChatCompletionRequest get default value from generation_config.json
3 participants