Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend][Core] Override HF config.json via CLI #5836

Merged
merged 14 commits into from
Nov 9, 2024

Conversation

KrishnaM251
Copy link
Contributor

@KrishnaM251 KrishnaM251 commented Jun 25, 2024

FIX #1542
FIX #2547
FIX #5205

PR Title and Classification

[Frontend] [Core]

Notes

  • While I believe the implementation is almost complete, the test I wrote only checks if the new Optional[dict] parameter hf_kwargs is succesfully set in ModelConfig.
  • I would like to write a test that detects a change in the LLMEngine output when kf_kwargs is added as a EngineArgs parameter.
  • But I have two questions:
    • How can I write another test that runs hf_kwargs through an OpenAI compatible server? I wanted to repurpose a test in test_openai_server.py, however the EngineArgs params are set before all tests are run.
    • What are some test values that I can set in hf_kwargs which will generate output detectable in a function like: completion = client.completions.create(...) (motivating example). If this is not the best approach for testing, then any recommendations?

@simon-mo
Copy link
Collaborator

@DarkLight1337 can you help answer the question since you recently touched the testing harness.

Additionally, there might be other places we want override including tokenizer or generation config. Addressing those will be nice to have.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Jun 26, 2024

Thanks for picking this up! To answer your questions:

  • How can I write another test that runs hf_kwargs through an OpenAI compatible server? I wanted to repurpose a test in test_openai_server.py, however the EngineArgs params are set before all tests are run.

You should launch a new instance of the server. Each RemoteOpenAIServer instance creates a new subprocess that invokes the OpenAI entrypoint through the command line, so you can't change the hf_config after instantiating it.

You can add pytest fixtures to use a different server for your tests.

  • What are some test values that I can set in hf_kwargs which will generate output detectable in a function like: completion = client.completions.create(...) (motivating example). If this is not the best approach for testing, then any recommendations?

Maybe you can set the attention implementation to eager, which from my understanding would cause some numerical inaccuracies compared to the default one. Actually, this may not work since we are running the vLLM model rather than the HF model.

Perhaps you can ask the author of the original issue what they want to accomplish using this feature that cannot otherwise be done via vLLM args. (If we don't have any situation that results in different vLLM output, what is the point of enabling this?)

@KrishnaM251
Copy link
Contributor Author

@DarkLight1337 I appreciate the response. I will do as you suggested.

@KrishnaM251 KrishnaM251 changed the title feat: passing hf_config args through openai server [Frontend][Core] passing hf_config args through openai server Jun 27, 2024
@KrishnaM251 KrishnaM251 closed this Jul 8, 2024
@KrishnaM251 KrishnaM251 reopened this Jul 8, 2024
@vpellegrain
Copy link

Hi @KrishnaM251,

Any news on this?

I have a specific use case in which I'd like to deploy a Phi3.5-vision model on a vLLM openai server entrypoint ; but i'd like to specify the argument num_crops=16 (which is by default 4 in the preprocessor config file).

@DarkLight1337
Copy link
Member

Hi @KrishnaM251,

Any news on this?

I have a specific use case in which I'd like to deploy a Phi3.5-vision model on a vLLM openai server entrypoint ; but i'd like to specify the argument num_crops=16 (which is by default 4 in the preprocessor config file).

@alex-jw-brooks is currently working on a PR that lets you pass options to the HF processor on demand instead of at startup time (the latter is what this PR focuses on). Stay tuned!

@alex-jw-brooks
Copy link
Contributor

Hi @vpellegrain - thought I would link this PR if you'd like to track it, this exposes num_crops as an processor kwarg for phi3v, and can be used or both offline inference and in the CLI when starting the server entry point.

Happy to add a follow-up to make this configurable per-request later on, but as it was already turning into a lot of code to correctly handle processor kwargs for memory profiling etc, the current PR sets it up for init time 😄

Copy link

mergify bot commented Nov 2, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @KrishnaM251 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@DarkLight1337
Copy link
Member

Sorry for forgetting about this! I think we now have a valid use case which is to patch out incorrect HF configs. cc @K-Mistele

@K-Mistele
Copy link
Contributor

right, would be good to be able to adjust RoPE/YARN configurations in config.json at startup-time. I left comments about this on some other issues I just can't seem to find them right this second

Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
…ing` and `rope_theta`

Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337 DarkLight1337 changed the title [Frontend][Core] passing hf_config args through openai server [Frontend][Core] Override HF config.json via CLI Nov 9, 2024
@DarkLight1337
Copy link
Member

I have updated this PR and also changed the tests to check overriding rope_scaling and rope_theta.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since now we have a use case for this, I'm approving the PR.

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 9, 2024
Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 9, 2024 14:59
@DarkLight1337 DarkLight1337 merged commit b09895a into vllm-project:main Nov 9, 2024
53 checks passed
JC1DA pushed a commit to JC1DA/vllm that referenced this pull request Nov 11, 2024
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: Loc Huynh <[email protected]>
jeejeelee pushed a commit to jeejeelee/vllm that referenced this pull request Nov 11, 2024
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
rickyyx pushed a commit to rickyyx/vllm that referenced this pull request Nov 13, 2024
sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: Sumit Dubey <[email protected]>
KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
mfournioux pushed a commit to mfournioux/vllm that referenced this pull request Nov 20, 2024
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: Maxime Fournioux <[email protected]>
tlrmchlsmth pushed a commit to neuralmagic/vllm that referenced this pull request Nov 23, 2024
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
6 participants