Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

[ CI ] LM Eval Testing Expansion #326

Merged
merged 51 commits into from
Jun 26, 2024
Merged

Conversation

robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Jun 23, 2024

SUMMARY:

  • updated hf lm-eval gsm baseline script to use accelerate, enabling us to generate baselines on big models'
  • added vllm lm-eval gsm baseline script for scenarios that hf does not support (e.g. fp8)
  • added lm-eval GSM model configs for broad set of models (small + large)
  • refactored smoke / full configs to reference the model configs + trigger one test at a time
  • refactored lm-eval accuracy test to avoid using ray to launch server, which caused issues cleaning up in server case
  • moved configs into .github folder so they are closer to the scripts

FOLLOW UP PRS:

  • enable distributed
  • enable H100 for large models
  • eliminate the neuralmagic directory

@@ -14,23 +14,19 @@ usage() {
echo
echo " -m - huggingface stub or local directory of the model"
echo " -b - batch size to run the evaluation at"
echo " -d - device to use (e.g. cuda, cuda:0, auto, cpu)"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out this doesn't work, need to pass parallelize=True to model args to use accelerate 🤦

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find

@@ -35,8 +35,8 @@ jobs:

test_configs: '[{"python":"3.8.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"},
{"python":"3.9.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"},
{"python":"3.10.12","label":"gcp-k8s-l4-duo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that we were splitting solo and duo across the different versions of python to get a certain amount of coverage. why change to all solo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • none of the tests running right now use the distributed features, so its silly to run them on a multi-gpu instance
  • I have two other PRs ongoing to re-enable distributed testing (fan out + enable distributed tests) that will turn this back on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase from latest main? I think this has been already changed to l4-solo.

"tokenizer_backend=huggingface",
"base_url=http://localhost:8000/v1",
])

logger.info("launching server")
with ServerContext(vllm_args, logger=logger) as _:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why abandon using the context manager for the server? There's now a bunch of duplicate code in this file that has to be maintained.

Copy link
Collaborator Author

@robertgshaw2-redhat robertgshaw2-redhat Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the ContextManager was using Ray to launch the tests, which was causing me a lot of issues + has been the source of several issues in upstream syncs.

For instance, prior to this, if we launched with tp>1, the ServerContext did not clean up the memory on the second gpu :) ---> even after the pytest process completed

The setup here is much cleaner and actually properly cleans up the GPU memory

Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@@ -0,0 +1,9 @@
Meta-Llama-3-70B-Instruct-FP8.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where to put this, but it might be good to have a brief README in this repo with: a sketch of hardware requirements for these models and brief description of the various items in the "yaml". As an example for the latter, what does num_fewshot mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill add this in the follow up

@@ -14,23 +14,19 @@ usage() {
echo
echo " -m - huggingface stub or local directory of the model"
echo " -b - batch size to run the evaluation at"
echo " -d - device to use (e.g. cuda, cuda:0, auto, cpu)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find

@@ -35,8 +35,8 @@ jobs:

test_configs: '[{"python":"3.8.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"},
{"python":"3.9.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"},
{"python":"3.10.12","label":"gcp-k8s-l4-duo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase from latest main? I think this has been already changed to l4-solo.

@@ -21,15 +21,15 @@ jobs:

test_configs: '[{"python":"3.8.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"},
{"python":"3.9.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"},
{"python":"3.10.12","label":"gcp-k8s-l4-duo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"},
{"python":"3.11.4","label":"gcp-k8s-l4-duo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"}]'
{"python":"3.10.12","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the nm-nightly.yml.

@robertgshaw2-redhat robertgshaw2-redhat merged commit ec8b450 into main Jun 26, 2024
28 checks passed
@robertgshaw2-redhat robertgshaw2-redhat deleted the expand-lm-eval-testing branch June 26, 2024 18:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants