Skip to content

Commit

Permalink
Cap Mistral's context length at 2k (#495)
Browse files Browse the repository at this point in the history
Temporary fix to prevent multiple TB of memory allocated just to attention masks
  • Loading branch information
collingray authored Jan 28, 2024
1 parent 19b3bc8 commit ba3fb3b
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion transformer_lens/loading_from_pretrained.py
Original file line number Diff line number Diff line change
Expand Up @@ -815,7 +815,7 @@ def convert_hf_model_config(model_name: str, **kwargs):
"n_heads": 32,
"d_mlp": 14336,
"n_layers": 32,
"n_ctx": 32768,
"n_ctx": 2048, # Capped due to memory issues
"d_vocab": 32000,
"act_fn": "silu",
"normalization_type": "RMS",
Expand Down

0 comments on commit ba3fb3b

Please sign in to comment.