-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend][Core] Override HF config.json
via CLI
#5836
Conversation
@DarkLight1337 can you help answer the question since you recently touched the testing harness. Additionally, there might be other places we want override including tokenizer or generation config. Addressing those will be nice to have. |
Thanks for picking this up! To answer your questions:
You should launch a new instance of the server. Each You can add pytest fixtures to use a different server for your tests.
Perhaps you can ask the author of the original issue what they want to accomplish using this feature that cannot otherwise be done via vLLM args. (If we don't have any situation that results in different vLLM output, what is the point of enabling this?) |
@DarkLight1337 I appreciate the response. I will do as you suggested. |
c925422
to
4f0e0ea
Compare
Hi @KrishnaM251, Any news on this? I have a specific use case in which I'd like to deploy a Phi3.5-vision model on a vLLM openai server entrypoint ; but i'd like to specify the argument |
@alex-jw-brooks is currently working on a PR that lets you pass options to the HF processor on demand instead of at startup time (the latter is what this PR focuses on). Stay tuned! |
Hi @vpellegrain - thought I would link this PR if you'd like to track it, this exposes Happy to add a follow-up to make this configurable per-request later on, but as it was already turning into a lot of code to correctly handle processor kwargs for memory profiling etc, the current PR sets it up for init time 😄 |
This pull request has merge conflicts that must be resolved before it can be |
Sorry for forgetting about this! I think we now have a valid use case which is to patch out incorrect HF configs. cc @K-Mistele |
right, would be good to be able to adjust RoPE/YARN configurations in |
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
…ing` and `rope_theta` Signed-off-by: DarkLight1337 <[email protected]>
config.json
via CLI
I have updated this PR and also changed the tests to check overriding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since now we have a use case for this, I'm approving the PR.
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Loc Huynh <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
FIX #1542
FIX #2547
FIX #5205
PR Title and Classification
[Frontend] [Core]Notes
Optional[dict]
parameterhf_kwargs
is succesfully set inModelConfig
.LLMEngine
output whenkf_kwargs
is added as aEngineArgs
parameter.hf_kwargs
through an OpenAI compatible server? I wanted to repurpose a test in test_openai_server.py, however theEngineArgs
params are set before all tests are run.hf_kwargs
which will generate output detectable in a function like:completion = client.completions.create(...)
(motivating example). If this is not the best approach for testing, then any recommendations?