-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor of perplexity computation #1197
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…specific processing is handled elsewhere
…1 only, so no need to consider batched execution. In addition, use input_tokens from generation pipeline
…ly reduced memory requirements
bfineran
previously approved these changes
Aug 23, 2023
dbogunowicz
reviewed
Aug 24, 2023
bfineran
previously approved these changes
Oct 23, 2023
dsikka
reviewed
Nov 1, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks a lot better. still need to verify testing cases. Could you point to where those files are? Don't seem to be a part of this PR.
bfineran
previously approved these changes
Nov 1, 2023
dbogunowicz
previously approved these changes
Nov 7, 2023
dsikka
reviewed
Nov 8, 2023
tests/deepsparse/transformers/pipelines/test_text_generation.py
Outdated
Show resolved
Hide resolved
dsikka
approved these changes
Nov 10, 2023
dbogunowicz
approved these changes
Nov 10, 2023
dbogunowicz
added a commit
that referenced
this pull request
Nov 13, 2023
* Add input_tokes as optional output * Refactor Perplexity class to only compute perplexity. All other task-specific processing is handled elsewhere * Simplify perplexity evaluation. Evaluation takes place as batch size 1 only, so no need to consider batched execution. In addition, use input_tokens from generation pipeline * Splits wikitext at regular intervals of the same length as the sequence length * Add argument for accumulation of negative log likelihood * Accumulate likelihood for wikitext * Simplification * Add support for wikitext-style ppl evaluation * Compute batch instead of storing until compute method. This drastically reduced memory requirements * Remove torch dependency * Move split of dataset into helper function * Quality fixes * Remove debugging prints * Remove debugging prints * Incorporate fixes for kv-cache * Include doc string for accumulate * Add support to trust-remote-code arguments * Add support to c4 * add a missing include_prompt_logits param * Remove unnecessary capping at sequence length (it's incorrect for cached models) * Simplify processing for concatenated datasets * Fix kv cache update * Fix kv cache update * Quality fixes * remove batch size from pipeline instantiation * Rename to wikitext2 * Remove trust_remote_code argument * Remove use_deepsparse_cache argument * Change padding of output to left in order to match padding of input ids and attention mask * Allow trust_remote_code to be passed as argument (in some cases tokenizer can be defined by custom code) * Move process_concatenated_datasets to helpers file * Added support for max_text_length to speed up processing of long datasets * Rebase w/ main * Rebase w/ main * Fix typo * Rebase * Use max_length instead of max_new_tokens * Rebase * Added typing and docstring * Added typing and docstring * Define concantenated datasets * Add warning about batch-size not being a supported argument for some datasets * Add unit test for pipeline and generation in ppl eval * Add lifecycle in docstring * Add copyright * Style fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Rebase * Rebase * Re-add unit test * Style fix * Update unit test * Update unit test --------- Co-authored-by: dbogunowicz <[email protected]> Co-authored-by: Damian <[email protected]> Co-authored-by: Benjamin Fineran <[email protected]> Co-authored-by: Rahul Tuli <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Refactor intended to simplify perplexity computation and add support for different datasets into the same codebase. Among the changes, these are the highlights:
Testing plan:
Verified ppl for base Codegen 350M mono:
Result: mean ppl: 3.60 (PyTorch: 3.60)
Verified ppl for OPT base 1.3b:
Result: mean ppl: 14.62 (PyTorch: 14.63)
NOTE: This pipeline was only tested for non-cached models. It should work with kv-cache models as well. Right now the pipeline is created withsequence_length=args.max_sequence_length
andprompt_processing_sequence_length=args.max_sequence_length
. As soon as the kv-cache issues around this case are resolved we should test ppl evaluation again.Update: Added support to c4 dataset in a way that complies with both the subsets defined in SparseGPT and LLM-foundry. Validated on cached models as well.