[Feature] Add load generation config from model #11164

liuyanyi · 2024-12-13T08:37:53Z

Desc

Add generation_config to engine_arg, allow load and overwrite the default sampling param.

To avoid braking change, this flag is defaults to None, preserving the existing behavior unless explicitly configured.

Here are the three mode of generation_config

None: keep the same behavior in previous version
auto: Load the default generation_config into sampling_param.
A folder store the generation_config: A custom folder to load generation config, works like transformers GenerationConfig.from_pretrained(path)

Key Changes:

merge _load_generation_config_dict in llm_engine into ModelConfig
Add generation_config arg to engine_args and vllm_config
Add get_default_sampling_params interface for LLM class.
Support for OpenAI server and swagger doc can be auto reflect the changes

FIX #10758 (link existing issues this PR will resolve)

Signed-off-by: liuyanyi <[email protected]>

github-actions · 2024-12-13T08:38:06Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: liuyanyi <[email protected]>

vllm/engine/arg_utils.py

vllm/entrypoints/llm.py

DarkLight1337 · 2024-12-13T14:21:25Z

vllm/entrypoints/openai/api_server.py

+        if any(p in generation_config_fields for p in available_params):
+            overwrite_config = {
+                p: generation_config_fields.get(p, None)
+                for p in available_params
+            }
+            logger.info("Overwriting generation config with: %s",
+                        overwrite_config)
+            # Modify the ChatCompletionRequest to include the generation config
+            for k, v in overwrite_config.items():
+                if v is not None:
+                    ChatCompletionRequest.model_fields[k].default = v
+                    CompletionRequest.model_fields[k].default = v
+
+            # Rebuild the models to include the new fields
+            ChatCompletionRequest.model_rebuild(force=True)
+            CompletionRequest.model_rebuild(force=True)


This feels a bit hacky, can we move the default sampling params generation to the serving engine class?

This tricky way is used for fastapi, ChatCompletionRequest and CompletionRequest should be modified before add_api_route. Without model rebuild, the server will still use the old param.

I mean that we move the default value generation outside of the pydantic model, into the serving engine class.

vllm/entrypoints/llm.py

Signed-off-by: Yanyi Liu <[email protected]>

mergify · 2024-12-15T04:46:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @liuyanyi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Yanyi Liu <[email protected]>

Signed-off-by: liuyanyi <[email protected]>

liuyanyi · 2024-12-16T01:09:40Z

@DarkLight1337 Hi, i've revise the code.
Now, entrypoints can read default sampling params from ModelConfig.get_default_sampling_params. For OpenAI server, the process logic has moved to ServingChat and ServingCompletion, no more pydantic mocking.

vllm/entrypoints/openai/protocol.py

DarkLight1337 · 2024-12-16T02:52:20Z

vllm/entrypoints/openai/protocol.py

+        temperature = self.temperature or default_sampling_params.get(
+            "temperature", 0.0)


The default here should be 1.0, see #11219

Signed-off-by: liuyanyi <[email protected]>

vllm/entrypoints/openai/protocol.py

DarkLight1337

LGTM, thanks for implementing this!

Signed-off-by: liuyanyi <[email protected]>

DarkLight1337 · 2024-12-16T12:17:56Z

https://buildkite.com/vllm/ci/builds/10739#0193cef2-4fc6-4ad9-82cc-1b4bbc85d544/6-10322

This test seems to be failing ever since the default temperature was set to 1.0. Can you set temperature = 0 for this?

mgoin · 2024-12-16T21:17:23Z

vllm/config.py

+        available_params = [
+            "repetition_penalty",
+            "temperature",
+            "top_k",
+            "top_p",
+            "min_p",
+        ]


How were these params chosen? I also see token ids specified in some generation configs
https://huggingface.co/microsoft/Phi-3.5-mini-instruct/blob/main/generation_config.json
https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/generation_config.json
https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/generation_config.json

vllm/vllm/engine/llm_engine.py

Line 850 in 0708124

sampling_params.update_from_generation_config(

I think token_ids has been used in llm_engine

mgoin · 2024-12-16T21:20:14Z

vllm/entrypoints/openai/protocol.py

-        temperature = self.temperature if self.temperature is not None else 0.0
+
+        if (temperature := self.temperature) is None:
+            temperature = default_sampling_params.get("temperature", 1.0)


I am weary of defining default values for parameters in multiple places. Already here and below you set temperature = 1.0 in four separate places. At the very least we should have a global static default_sampling_params that can be referenced for each parameter name, so if we change the default it just needs to be one place.

mgoin · 2024-12-16T21:21:12Z

vllm/entrypoints/openai/protocol.py

-    temperature: Optional[float] = 1.0
-    top_p: Optional[float] = 1.0
+    temperature: Optional[float] = None
+    top_p: Optional[float] = None


I understand why you want these to be None by default, but I think it is most clear to have the default value here. I won't block based on this

If a default value is set, fastapi will use the default value diretly, i guess a comment is enough?

We can't set a default value here because it depends on generation config. The best we can do is what you suggested in the previous comment.

… requests Signed-off-by: liuyanyi <[email protected]>

liuyanyi · 2024-12-17T03:29:55Z

https://buildkite.com/vllm/ci/builds/10739#0193cef2-4fc6-4ad9-82cc-1b4bbc85d544/6-10322

This test seems to be failing ever since the default temperature was set to 1.0. Can you set temperature = 0 for this?

I think the default temperature 1.0 lead to this fail, pr #11219 failed too. i set the temperature to 0.0 in this test.

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: liuyanyi <[email protected]> Signed-off-by: Yanyi Liu <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

[Feature] Add load generation config from model

242ea24

Signed-off-by: liuyanyi <[email protected]>

liuyanyi requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac, alexm-redhat, zhuohan123 and youkaichao as code owners December 13, 2024 08:37

mergify bot added the frontend label Dec 13, 2024

liuyanyi added 2 commits December 13, 2024 17:19

Fix v1 engine

bdedf17

Signed-off-by: liuyanyi <[email protected]>

Fix lint

c241417

Signed-off-by: liuyanyi <[email protected]>

DarkLight1337 reviewed Dec 13, 2024

View reviewed changes

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

liuyanyi added 4 commits December 13, 2024 17:09

fix

4ee9aac

Signed-off-by: Yanyi Liu <[email protected]>

move

7f2d07e

Signed-off-by: Yanyi Liu <[email protected]>

Unify diff sample param into model config

ed0b9b0

Signed-off-by: Yanyi Liu <[email protected]>

remove useless generation_config_fields

ef436b7

Signed-off-by: Yanyi Liu <[email protected]>

liuyanyi force-pushed the add_generation_config branch from 6b02f5d to ef436b7 Compare December 13, 2024 17:10

fix test for chat

cde9a97

Signed-off-by: Yanyi Liu <[email protected]>

liuyanyi requested a review from simon-mo as a code owner December 14, 2024 03:21

remove tricky pydantic process

c0c806e

Signed-off-by: Yanyi Liu <[email protected]>

mergify bot added the needs-rebase label Dec 15, 2024

liuyanyi added 5 commits December 15, 2024 04:47

reverse

41b1786

Signed-off-by: Yanyi Liu <[email protected]>

fix default value

b24cc51

Signed-off-by: Yanyi Liu <[email protected]>

Merge remote-tracking branch 'origin/main' into add_generation_config

ec81aab

Merge remote-tracking branch 'origin/main' into add_generation_config

90ccb80

fix wrong temp

cffde04

Signed-off-by: liuyanyi <[email protected]>

mergify bot removed the needs-rebase label Dec 15, 2024