You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How much GPU memory does the diffusers implementation of the Hunyuan Model take? I tried to run it on a H100 but it didn't work got the following error. Did anyone successfully manage to run and get an output?
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/diffusers/models/attention_processor.py", line 588, in forward
return self.processor(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/diffusers/models/transformers/transformer_hunyuan_video.py", line 117, in __call__
hidden_states = F.scaled_dot_product_attention(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.40 GiB. GPU 0 has a total capacity of 79.32 GiB of which 19.74 GiB is free. Process 2282098 has 59.57 GiB memory in use. Of the allocated memory 56.78 GiB is allocated by PyTorch, and 2.06 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
The code I was using is
import torch
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, HunyuanVideoTransformer3DModel, HunyuanVideoPipeline
from diffusers.utils import export_to_video
# from hyvideo.modules.models import HUNYUAN_VIDEO_CONFIG
# from hyvideo.constants import PROMPT_TEMPLATE_ENCODE, PROMPT_TEMPLATE_ENCODE_VIDEO
# print(list(HUNYUAN_VIDEO_CONFIG.keys()))
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
transformer_8bit = HunyuanVideoTransformer3DModel.from_pretrained(
"tencent/HunyuanVideo",
subfolder="transformer",
quantization_config=quant_config,
torch_dtype=torch.bfloat16,
revision="refs/pr/18",
)
pipeline = HunyuanVideoPipeline.from_pretrained(
"tencent/HunyuanVideo",
transformer=transformer_8bit,
torch_dtype=torch.float16,
revision="refs/pr/18",
device_map="balanced",
)#.to("cuda")
prompt = "A cat walks on the grass, realistic style."
output = self.pipeline(prompt=prompt,
height=720,
width=1280,
num_frames=129,
num_inference_steps=30
).frames[0]
save_path = 'cat.mp4'
export_to_video(output, save_path, fps=15)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
How much GPU memory does the diffusers implementation of the Hunyuan Model take? I tried to run it on a H100 but it didn't work got the following error. Did anyone successfully manage to run and get an output?
The code I was using is
Beta Was this translation helpful? Give feedback.
All reactions