Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance (reproducible) Issue #1058

Open
Thunderbeee opened this issue Jan 11, 2025 · 5 comments
Open

Performance (reproducible) Issue #1058

Thunderbeee opened this issue Jan 11, 2025 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@Thunderbeee
Copy link

Hi the team! Thanks so much for your works. I follow the instruction provided in #1028, but the final accuracy is not desirable. I use ultrachat as the dataset, and test on MMLU.

Screenshot 2025-01-10 at 11 31 03 PM

However, from Huggingface https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4, we can found the Accuracy Recovery is extremely high. I am wondering how to reproduce the result you claimed officially. Thanks!

@Thunderbeee Thunderbeee added the bug Something isn't working label Jan 11, 2025
@dsikka dsikka self-assigned this Jan 11, 2025
@kylesayrs
Copy link
Collaborator

Hi @Thunderbeee,

The models in your table do not match the model you linked to (your table references llama3.2-1b-instruct and gemma-2b, but the model you linked to is from the llama-3.1-8B base model)

Achieving accuracy recovery after intensive compression (both sparsification and quantization) for small models (<8B) is often very difficult. 1B and 2B models are already extremely parameter efficient, so compressing these models will affect accuracy more than compressing larger models, which typically have less parameter efficiency and therefore more capability for compression.

@kylesayrs kylesayrs self-assigned this Jan 15, 2025
@kylesayrs
Copy link
Collaborator

If yours is a TinyML application and requires small models, I would recommend focusing on fine tuning your model to your specific task/domain and evaluating on that task. I would not expect compressed (especially fine-tuned) 1B models to perform well on generalized tasks such as MMLU.

@Thunderbeee
Copy link
Author

Thank you kylesayrs! I get it. I found sparsity + distillation may achieve a better result. Hence, I follow the example in trl_mixin/ex_trl_distillation.py, and change recipe to

recipe = """
test_stage:
  pruning_modifiers:
    SparseGPTModifier:
      sparsity: 0.5
      mask_structure: "2:4"
      sequential_update: false
    OutputDistillationModifier:
      targets: ['re:model.layers.\\d+$']
      comparison: "square_head"
      start: 0
      orig_scale: 1.0
      distill_scale: 1.0
"""

but it could not perform SparseGPTModifier.

Could you please provide an instruction how to perform SparseGPT + Distillation? and explain why ConstantPruningModifier works but SparseGPTModifier does not?

Thanks so much!

@kylesayrs
Copy link
Collaborator

Hey @Thunderbeee!

Can you explain more about what you mean when you say SparseGPTModifier could not be performed? Does an error occur?

For more information on the ConstantPruningModifier, see this explaination. For your use case, you'll likely want to do SparseGPTModifier -> ConstantPruningModifier -> OutputDistillationModifier.

@kylesayrs kylesayrs added question Further information is requested and removed bug Something isn't working labels Jan 17, 2025
@Thunderbeee
Copy link
Author

Thanks Kylesayrs! SparseGPTModifier could not be performed means it was skipped, and started KD directly. I check the code, I think it is because

# ignore modifier structure initialized from one-shot
        if state.start_event.current_index >= 0 and self.calculate_start() < 0:
            return

llmcompressor/modifiers/modifier.py:107.

Therefore, I change the recipe to

recipe = """
test_stage:
pruning_modifiers:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
start: 0
OutputDistillationModifier:
targets: ['re:model.layers.\d+$']
comparison: "square_head"
start: 0
orig_scale: 1.0
distill_scale: 1.0
"""

but get issue:

Traceback (most recent call last):
  File "/home/azureuser/mingyuan/llm-compressor/examples/trl_mixin/ex_trl_distillation.py", line 89, in <module>
    trainer.train()
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/transformers/finetune/session_mixin.py", line 374, in train
    self.initialize_session(epoch=epoch, checkpoint=checkpoint, stage=stage)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/transformers/finetune/session_mixin.py", line 144, in initialize_session
    initialize(
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/core/session_functions.py", line 116, in initialize
    return active_session().initialize(
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/core/session.py", line 157, in initialize
    mod_data = self._lifecycle.initialize(
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/core/lifecycle.py", line 127, in initialize
    data = mod.initialize(state=self.state, **extras)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/modifiers/stage.py", line 124, in initialize
    modifier.initialize(state, **kwargs)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/modifiers/modifier.py", line 118, in initialize
    initialized = self.on_initialize(state=state, **kwargs)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/modifiers/obcq/base.py", line 108, in on_initialize
    self.apply_compression(calibration_dataloader)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/modifiers/obcq/base.py", line 187, in apply_compression
    run_calibration_forward(self.model, dataloader, mask_padding=True)
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/llmcompressor/modifiers/utils/pytorch_helpers.py", line 89, in run_calibration_forward
    for batch_idx, batch in enumerate(tqdm(_dataloader)):
  File "/home/azureuser/.conda/envs/compressor/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
TypeError: 'NoneType' object is not iterable

@dsikka dsikka assigned kylesayrs and unassigned dsikka Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants