Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBI复现问题 #12

Open
zzezze opened this issue Jan 10, 2025 · 2 comments
Open

HBI复现问题 #12

zzezze opened this issue Jan 10, 2025 · 2 comments

Comments

@zzezze
Copy link

zzezze commented Jan 10, 2025

image

你好,就是HBI复现时候,一直无法跑到48.5以上,我想问一下作者现在还能复现出来吗?我都是直接克隆git,数据集与环境也是严格按照你的Readme的流程,我也重复好几次了,跑出来log里面都无法达到48.5以上,但是使用作者提供的checkpoint直接进行推理,却能达到论文里的指标;
数据集:MSR-VTT
显卡:2张A6000

@jpthu17
Copy link
Owner

jpthu17 commented Jan 11, 2025

我只有A100,MSRVTT上的实验是在4*A100上跑出来的,一般来说,显卡数量和显卡型号都会对产生随机性。
此外,如果你追求性能,可以对dataloader做一点更改,用比较慢但是数据精度更高的视频读取方式:

def __getitem__(self, idx):
if self.mode == 'all':
video_id, caption = self.sentences_dict[idx]
text_ids, text_mask, s, e = self._get_text(caption)
video, video_mask = self._get_rawvideo_dec(video_id, s, e)
# video, video_mask = self._get_rawvideo(video_id, s, e)
return text_ids, text_mask, video, video_mask, idx, hash(video_id.replace("video", ""))
elif self.mode == 'text':
video_id, caption = self.sentences_dict[idx]
text_ids, text_mask, s, e = self._get_text(caption)
return text_ids, text_mask, idx
elif self.mode == 'video':
video_id = self.video_list[idx]
video, video_mask = self._get_rawvideo_dec(video_id)
# video, video_mask = self._get_rawvideo(video_id)
return video, video_mask, idx

更改为:

    if self.mode == 'all':
        video_id, caption = self.sentences_dict[idx]
        text_ids, text_mask, s, e = self._get_text(caption)
        # video, video_mask = self._get_rawvideo_dec(video_id, s, e)
        video, video_mask = self._get_rawvideo(video_id, s, e)
        return text_ids, text_mask, video, video_mask, idx, hash(video_id.replace("video", ""))
    elif self.mode == 'text':
        video_id, caption = self.sentences_dict[idx]
        text_ids, text_mask, s, e = self._get_text(caption)
        return text_ids, text_mask, idx
    elif self.mode == 'video':
        video_id = self.video_list[idx]
        # video, video_mask = self._get_rawvideo_dec(video_id)
        video, video_mask = self._get_rawvideo(video_id)
        return video, video_mask, idx

@zzezze
Copy link
Author

zzezze commented Jan 11, 2025

非常感谢,具体问一下作者跑MSR-VTT的batch是开到多大的?128的话,应该一张80G的A100应该是显存够的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants