You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I set the parameter gradient_accumulation_steps to 1,bachsize to 1 and use LoRA to make the number of trainable parameters reduce to 3,276,800.However,with two v100(32G),I still can't run this experiment for CUDA out of memory.
What other methods can reduce the need for video memory?
The text was updated successfully, but these errors were encountered:
Hi, I use 8 A100 GPUs with 80GB of memory each to fine-tune the model. For your case, I suggest using FP16 training and reducing the number of LoRA trainable parameters to conduct the experiments.
I set the parameter gradient_accumulation_steps to 1,bachsize to 1 and use LoRA to make the number of trainable parameters reduce to 3,276,800.However,with two v100(32G),I still can't run this experiment for CUDA out of memory.
What other methods can reduce the need for video memory?
The text was updated successfully, but these errors were encountered: