Performance (reproducible) Issue #1058

Thunderbeee · 2025-01-11T04:33:50Z

Hi the team! Thanks so much for your works. I follow the instruction provided in #1028, but the final accuracy is not desirable. I use ultrachat as the dataset, and test on MMLU.

However, from Huggingface https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4, we can found the Accuracy Recovery is extremely high. I am wondering how to reproduce the result you claimed officially. Thanks!

kylesayrs · 2025-01-15T04:20:40Z

Hi @Thunderbeee,

The models in your table do not match the model you linked to (your table references llama3.2-1b-instruct and gemma-2b, but the model you linked to is from the llama-3.1-8B base model)

Achieving accuracy recovery after intensive compression (both sparsification and quantization) for small models (<8B) is often very difficult. 1B and 2B models are already extremely parameter efficient, so compressing these models will affect accuracy more than compressing larger models, which typically have less parameter efficiency and therefore more capability for compression.

kylesayrs · 2025-01-15T04:27:51Z

If yours is a TinyML application and requires small models, I would recommend focusing on fine tuning your model to your specific task/domain and evaluating on that task. I would not expect compressed (especially fine-tuned) 1B models to perform well on generalized tasks such as MMLU.

Thunderbeee · 2025-01-16T17:15:37Z

Thank you kylesayrs! I get it. I found sparsity + distillation may achieve a better result. Hence, I follow the example in trl_mixin/ex_trl_distillation.py, and change recipe to

recipe = """
test_stage:
  pruning_modifiers:
    SparseGPTModifier:
      sparsity: 0.5
      mask_structure: "2:4"
      sequential_update: false
    OutputDistillationModifier:
      targets: ['re:model.layers.\\d+$']
      comparison: "square_head"
      start: 0
      orig_scale: 1.0
      distill_scale: 1.0
"""

but it could not perform SparseGPTModifier.

Could you please provide an instruction how to perform SparseGPT + Distillation? and explain why ConstantPruningModifier works but SparseGPTModifier does not?

Thanks so much!

kylesayrs · 2025-01-17T06:47:21Z

Hey @Thunderbeee!

Can you explain more about what you mean when you say SparseGPTModifier could not be performed? Does an error occur?

For more information on the ConstantPruningModifier, see this explaination. For your use case, you'll likely want to do SparseGPTModifier -> ConstantPruningModifier -> OutputDistillationModifier.

Thunderbeee · 2025-01-17T08:35:42Z

Thanks Kylesayrs! SparseGPTModifier could not be performed means it was skipped, and started KD directly. I check the code, I think it is because

# ignore modifier structure initialized from one-shot
        if state.start_event.current_index >= 0 and self.calculate_start() < 0:
            return

llmcompressor/modifiers/modifier.py:107.

Therefore, I change the recipe to

recipe = """
test_stage:
pruning_modifiers:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
start: 0
OutputDistillationModifier:
targets: ['re:model.layers.\d+$']
comparison: "square_head"
start: 0
orig_scale: 1.0
distill_scale: 1.0
"""

but get issue:

Traceback (most recent call last):
  File "/home/azureuser/mingyuan/llm-compressor/examples/trl_mixin/ex_trl_distillation.py", line 89, in <module>
    trainer.train()
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/transformers/finetune/session_mixin.py", line 374, in train
    self.initialize_session(epoch=epoch, checkpoint=checkpoint, stage=stage)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/transformers/finetune/session_mixin.py", line 144, in initialize_session
    initialize(
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/core/session_functions.py", line 116, in initialize
    return active_session().initialize(
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/core/session.py", line 157, in initialize
    mod_data = self._lifecycle.initialize(
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/core/lifecycle.py", line 127, in initialize
    data = mod.initialize(state=self.state, **extras)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/modifiers/stage.py", line 124, in initialize
    modifier.initialize(state, **kwargs)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/modifiers/modifier.py", line 118, in initialize
    initialized = self.on_initialize(state=state, **kwargs)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/modifiers/obcq/base.py", line 108, in on_initialize
    self.apply_compression(calibration_dataloader)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/modifiers/obcq/base.py", line 187, in apply_compression
    run_calibration_forward(self.model, dataloader, mask_padding=True)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/modifiers/utils/pytorch_helpers.py", line 89, in run_calibration_forward
    for batch_idx, batch in enumerate(tqdm(_dataloader)):
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
TypeError: 'NoneType' object is not iterable

Thunderbeee added the bug Something isn't working label Jan 11, 2025

dsikka self-assigned this Jan 11, 2025

kylesayrs self-assigned this Jan 15, 2025

dsikka unassigned kylesayrs Jan 15, 2025

kylesayrs added question Further information is requested and removed bug Something isn't working labels Jan 17, 2025

dsikka assigned kylesayrs and unassigned dsikka Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance (reproducible) Issue #1058

Performance (reproducible) Issue #1058

Thunderbeee commented Jan 11, 2025

kylesayrs commented Jan 15, 2025

kylesayrs commented Jan 15, 2025

Thunderbeee commented Jan 16, 2025

kylesayrs commented Jan 17, 2025

Thunderbeee commented Jan 17, 2025

Performance (reproducible) Issue #1058

Performance (reproducible) Issue #1058

Comments

Thunderbeee commented Jan 11, 2025

kylesayrs commented Jan 15, 2025

kylesayrs commented Jan 15, 2025

Thunderbeee commented Jan 16, 2025

kylesayrs commented Jan 17, 2025

Thunderbeee commented Jan 17, 2025