[ CI ] LM Eval Testing Expansion #326

robertgshaw2-redhat · 2024-06-23T13:27:19Z

SUMMARY:

updated hf lm-eval gsm baseline script to use accelerate, enabling us to generate baselines on big models'
added vllm lm-eval gsm baseline script for scenarios that hf does not support (e.g. fp8)
added lm-eval GSM model configs for broad set of models (small + large)
refactored smoke / full configs to reference the model configs + trigger one test at a time
refactored lm-eval accuracy test to avoid using ray to launch server, which caused issues cleaning up in server case
moved configs into .github folder so they are closer to the scripts

FOLLOW UP PRS:

enable distributed
enable H100 for large models
eliminate the neuralmagic directory

robertgshaw2-redhat · 2024-06-23T13:27:54Z

.github/scripts/nm-run-lm-eval-gsm-hf-baseline.sh

@@ -14,23 +14,19 @@ usage() {
    echo
    echo "  -m    - huggingface stub or local directory of the model"
    echo "  -b    - batch size to run the evaluation at"
-    echo "  -d    - device to use (e.g. cuda, cuda:0, auto, cpu)"


turns out this doesn't work, need to pass parallelize=True to model args to use accelerate 🤦

derekk-nm · 2024-06-26T11:01:56Z

.github/workflows/nm-nightly.yml

@@ -35,8 +35,8 @@ jobs:

            test_configs: '[{"python":"3.8.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"},
                            {"python":"3.9.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"},
-                            {"python":"3.10.12","label":"gcp-k8s-l4-duo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"},


I assumed that we were splitting solo and duo across the different versions of python to get a certain amount of coverage. why change to all solo?

none of the tests running right now use the distributed features, so its silly to run them on a multi-gpu instance

I have two other PRs ongoing to re-enable distributed testing (fan out + enable distributed tests) that will turn this back on

Can you rebase from latest main? I think this has been already changed to l4-solo.

derekk-nm · 2024-06-26T11:27:52Z

tests/accuracy/test_lm_eval_correctness.py

        "tokenizer_backend=huggingface",
        "base_url=http://localhost:8000/v1",
    ])

-    logger.info("launching server")
-    with ServerContext(vllm_args, logger=logger) as _:


Why abandon using the context manager for the server? There's now a bunch of duplicate code in this file that has to be maintained.

Because the ContextManager was using Ray to launch the tests, which was causing me a lot of issues + has been the source of several issues in upstream syncs.

For instance, prior to this, if we launched with tp>1, the ServerContext did not clean up the memory on the second gpu :) ---> even after the pytest process completed

The setup here is much cleaner and actually properly cleans up the GPU memory

andy-neuma

thanks

andy-neuma · 2024-06-26T14:40:11Z

.github/lm-eval-configs/full-large-models.txt

@@ -0,0 +1,9 @@
+Meta-Llama-3-70B-Instruct-FP8.yaml


Not sure where to put this, but it might be good to have a brief README in this repo with: a sketch of hardware requirements for these models and brief description of the various items in the "yaml". As an example for the latter, what does num_fewshot mean?

Ill add this in the follow up

andy-neuma · 2024-06-26T14:43:11Z

.github/scripts/nm-run-lm-eval-gsm-hf-baseline.sh

@@ -14,23 +14,19 @@ usage() {
    echo
    echo "  -m    - huggingface stub or local directory of the model"
    echo "  -b    - batch size to run the evaluation at"
-    echo "  -d    - device to use (e.g. cuda, cuda:0, auto, cpu)"


dhuangnm · 2024-06-26T14:59:51Z

.github/workflows/nm-nightly.yml

@@ -35,8 +35,8 @@ jobs:

            test_configs: '[{"python":"3.8.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"},
                            {"python":"3.9.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"},
-                            {"python":"3.10.12","label":"gcp-k8s-l4-duo","test":"neuralmagic/tests/test_skip_env_vars/full.txt"},


Can you rebase from latest main? I think this has been already changed to l4-solo.

dhuangnm · 2024-06-26T15:00:48Z

.github/workflows/nm-remote-push.yml

@@ -21,15 +21,15 @@ jobs:

            test_configs: '[{"python":"3.8.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"},
                            {"python":"3.9.17","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"},
-                            {"python":"3.10.12","label":"gcp-k8s-l4-duo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"},
-                            {"python":"3.11.4","label":"gcp-k8s-l4-duo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"}]'
+                            {"python":"3.10.12","label":"gcp-k8s-l4-solo","test":"neuralmagic/tests/test_skip_env_vars/smoke.txt"},


Same as the nm-nightly.yml.

robertgshaw2-redhat added 5 commits June 22, 2024 22:25

configs for expanded lm-eval testing

19f06cf

updated configs

02d9647

added many configs

b848d3c

stash

a5cac54

updated

a260f8a

robertgshaw2-redhat commented Jun 23, 2024

View reviewed changes

robertgshaw2-redhat added 18 commits June 23, 2024 13:28

nit on large models

cc70508

cleanup configs

f7e1aca

rmove changes to utils.py

3196d6c

lint

906518f

cleanup utils.py

6b03af6

remove comment

bea9e60

added skipped files

8fdca19

update actions

08ff3a3

re added

a8727d0

fix typo in action

b0edd0a

nit

115c588

nit

d9c804e

removed utils.py changes

441d718

fix workflow

e537aef

config

fcfbd5e

fix workflow hopefully

999e056

fixed lm-eval-workflow

1fa67e3

one more time...

e788687

robertgshaw2-redhat requested review from andy-neuma, mgoin, dbarbuzzi and dhuangnm June 23, 2024 18:54

robertgshaw2-redhat added 2 commits June 23, 2024 18:56

added vllm baselining script

c7471de

last multi typo

19163d6

robertgshaw2-redhat and others added 22 commits June 23, 2024 22:22

Update nm-run-lm-eval-vllm.sh

5ffd63d

Merge branch 'main' into expand-lm-eval-testing

9d21016

convert lm-eval test script to avoid for loop

d701fd2

stash

4bcaac3

removed multi gpu tests

f5fc48c

nit

0e19bb5

clean up lm-eval labels

a499686

spurious change

b173468

fix types

877990e

fix workflow

531d1c3

removed phi from small models, it is 28GB

04a06ad

format

86513a9

bump up timeout

cc24664

comment

a8f701a

format

811d3a6

Update nm-nightly.yml

d1844db

Update smoke-small-models.txt

334de0e

Merge branch 'main' into expand-lm-eval-testing

cfb5af6

Update build.sh

085e39c

Update format.sh

7cdc163

Update format.sh

adabde7

Update loader.py

ba59010

derekk-nm reviewed Jun 26, 2024

View reviewed changes

andy-neuma approved these changes Jun 26, 2024

View reviewed changes

dhuangnm approved these changes Jun 26, 2024

View reviewed changes

robertgshaw2-redhat added 2 commits June 26, 2024 15:06

Merge branch 'main' into expand-lm-eval-testing

ff0ea23

format

95eb999

robertgshaw2-redhat merged commit ec8b450 into main Jun 26, 2024
28 checks passed

robertgshaw2-redhat deleted the expand-lm-eval-testing branch June 26, 2024 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ CI ] LM Eval Testing Expansion #326

[ CI ] LM Eval Testing Expansion #326

robertgshaw2-redhat commented Jun 23, 2024 •

edited

Loading

robertgshaw2-redhat Jun 23, 2024

andy-neuma Jun 26, 2024

derekk-nm Jun 26, 2024

robertgshaw2-redhat Jun 26, 2024

dhuangnm Jun 26, 2024

derekk-nm Jun 26, 2024

robertgshaw2-redhat Jun 26, 2024 •

edited

Loading

andy-neuma left a comment

andy-neuma Jun 26, 2024

robertgshaw2-redhat Jun 26, 2024

andy-neuma Jun 26, 2024

dhuangnm Jun 26, 2024

dhuangnm Jun 26, 2024

[ CI ] LM Eval Testing Expansion #326

[ CI ] LM Eval Testing Expansion #326

Conversation

robertgshaw2-redhat commented Jun 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robertgshaw2-redhat Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

andy-neuma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robertgshaw2-redhat commented Jun 23, 2024 •

edited

Loading

robertgshaw2-redhat Jun 26, 2024 •

edited

Loading