-
Notifications
You must be signed in to change notification settings - Fork 10
Conversation
@@ -14,23 +14,19 @@ usage() { | |||
echo | |||
echo " -m - huggingface stub or local directory of the model" | |||
echo " -b - batch size to run the evaluation at" | |||
echo " -d - device to use (e.g. cuda, cuda:0, auto, cpu)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out this doesn't work, need to pass parallelize=True
to model args to use accelerate 🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice find
.github/workflows/nm-nightly.yml
Outdated
@@ -35,8 +35,8 @@ jobs: | |||
|
|||
test_configs: '[{"python":"3.8.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"}, | |||
{"python":"3.9.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"}, | |||
{"python":"3.10.12","label":"gcp-k8s-l4-duo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that we were splitting solo
and duo
across the different versions of python to get a certain amount of coverage. why change to all solo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- none of the tests running right now use the distributed features, so its silly to run them on a multi-gpu instance
- I have two other PRs ongoing to re-enable distributed testing (fan out + enable distributed tests) that will turn this back on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rebase from latest main? I think this has been already changed to l4-solo.
"tokenizer_backend=huggingface", | ||
"base_url=http://localhost:8000/v1", | ||
]) | ||
|
||
logger.info("launching server") | ||
with ServerContext(vllm_args, logger=logger) as _: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why abandon using the context manager for the server? There's now a bunch of duplicate code in this file that has to be maintained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the ContextManager
was using Ray to launch the tests, which was causing me a lot of issues + has been the source of several issues in upstream syncs.
For instance, prior to this, if we launched with tp>1
, the ServerContext
did not clean up the memory on the second gpu :) ---> even after the pytest process completed
The setup here is much cleaner and actually properly cleans up the GPU memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
@@ -0,0 +1,9 @@ | |||
Meta-Llama-3-70B-Instruct-FP8.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure where to put this, but it might be good to have a brief README in this repo with: a sketch of hardware requirements for these models and brief description of the various items in the "yaml". As an example for the latter, what does num_fewshot
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ill add this in the follow up
@@ -14,23 +14,19 @@ usage() { | |||
echo | |||
echo " -m - huggingface stub or local directory of the model" | |||
echo " -b - batch size to run the evaluation at" | |||
echo " -d - device to use (e.g. cuda, cuda:0, auto, cpu)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice find
.github/workflows/nm-nightly.yml
Outdated
@@ -35,8 +35,8 @@ jobs: | |||
|
|||
test_configs: '[{"python":"3.8.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"}, | |||
{"python":"3.9.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"}, | |||
{"python":"3.10.12","label":"gcp-k8s-l4-duo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rebase from latest main? I think this has been already changed to l4-solo.
.github/workflows/nm-remote-push.yml
Outdated
@@ -21,15 +21,15 @@ jobs: | |||
|
|||
test_configs: '[{"python":"3.8.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"}, | |||
{"python":"3.9.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"}, | |||
{"python":"3.10.12","label":"gcp-k8s-l4-duo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"}, | |||
{"python":"3.11.4","label":"gcp-k8s-l4-duo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"}]' | |||
{"python":"3.10.12","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the nm-nightly.yml.
SUMMARY:
.github
folder so they are closer to the scriptsFOLLOW UP PRS:
neuralmagic
directory