Vision encoder output dimension does not match #1

yuaoze · 2024-10-18T07:18:42Z

Hi, thanks for your excellent work! I'm trying to run bash eval_calvin.sh.
When running to FeedbackPolicy/models/policy.py, there is an issue where the shape of the vision_x input to vision_encoder is 192 * 192, which does not match the model size of 224 * 224.
So I interpolated vision_x to 224 * 224, and the shape of output by vision_encoder is 8 * 768, which does not match the dimension of the rearrange operation.vision_x = rearrange(vision_x, "(b T) d h w -> b T (h w) d", b=b, T=T)

The text was updated successfully, but these errors were encountered:

retsuh-bqw · 2024-10-18T10:55:10Z

Thanks for your interests in our work!
We modify the default input size of VC1-Base model (from 224 to 192) in its corresponding config file. Just a small tweak to the config will let you use our evaluation scripts effectively.

Further updates are welcome if it fails to solve your issue. 😃

yuaoze · 2024-10-21T01:49:47Z

Thanks for your interests in our work! We modify the default input size of VC1-Base model (from 224 to 192) in its corresponding config file. Just a small tweak to the config will let you use our evaluation scripts effectively.

Further updates are welcome if it fails to solve your issue. 😃

Hi, I followed your advice and modified the config file of VC1-Base model, but error still occurred. Here is the details.

yuaoze · 2024-10-21T03:15:49Z

Thanks for your interests in our work! We modify the default input size of VC1-Base model (from 224 to 192) in its corresponding config file. Just a small tweak to the config will let you use our evaluation scripts effectively.
Further updates are welcome if it fails to solve your issue. 😃

Hi, I followed your advice and modified the config file of VC1-Base model, but error still occurred. Here is the details.

I solved this issue by specified output_size: 192 under "transform" in config file
But output of vision_encoder is shape of 8 * 768, which can not match the dimension of the rearrange operation.vision_x = rearrange(vision_x, "(b T) d h w -> b T (h w) d", b=b, T=T)
Can you give me some advice?

retsuh-bqw · 2024-10-21T04:25:10Z

But output of vision_encoder is shape of 8 * 768, which can not match the dimension of the rearrange operation.vision_x = rearrange(vision_x, "(b T) d h w -> b T (h w) d", b=b, T=T)
Can you give me some advice?

My bad. You should also set use_cls to False in the config file. Then the encoder will return all feature tokens.

hkz103 · 2024-10-28T11:35:35Z

Hello! I met the same problem. After I set img_size to 192 and use_cls to False, the error still occurred: AssertionError("Input image height (224) doesn't match model (192)."). Can you give me more advice?

retsuh-bqw · 2024-10-29T04:06:48Z

Hello! I met the same problem. After I set img_size to 192 and use_cls to False, the error still occurred: AssertionError("Input image height (224) doesn't match model (192)."). Can you give me more advice?

Is it because the sanity check in the load_model function (line 26 - 29) of VC-1?
You may change the function as following:

def load_model(
    model,
    transform,
    metadata=None,
    checkpoint_dict=None,
):
    if checkpoint_dict is not None:
        msg = model.load_state_dict(checkpoint_dict)
        log.warning(msg)

    return model

hkz103 · 2024-10-30T07:35:51Z

Hello! I met the same problem. After I set img_size to 192 and use_cls to False, the error still occurred: AssertionError("Input image height (224) doesn't match model (192)."). Can you give me more advice?

Is it because the sanity check in the load_model function (line 26 - 29) of VC-1? You may change the function as following:
def load_model(
    model,
    transform,
    metadata=None,
    checkpoint_dict=None,
):
    if checkpoint_dict is not None:
        msg = model.load_state_dict(checkpoint_dict)
        log.warning(msg)

    return model

It works! But I met a new problem:

retsuh-bqw · 2024-10-30T07:54:56Z

It works! But I met a new problem:

It seems to be an issue within CALVIN. Is your CALVIN env properly installed?

gouyinghong · 2024-10-30T08:53:25Z

Hi, I run `bash eval_calvin.sh`, but `failed to EGL with glad.`, Do you know how to solve this?

hkz103 · 2024-10-30T10:26:37Z

It works! But I met a new problem:

It seems to be an issue within CALVIN. Is your CALVIN env properly installed?

You are right. I didn't properly install CALVIN. However, the packages uesd in CALVIN and CLOVER seem contradictory. Can you provide a requirements.txt?

retsuh-bqw · 2024-10-30T10:45:51Z

You are right. I didn't properly install CALVIN. However, the packages uesd in CALVIN and CLOVER seem contradictory. Can you provide a requirements.txt?

There is a provided requirements.txt at visual_planner/requirements.txt.
What packages conflicts are you getting exactly?

hkz103 · 2024-10-31T05:34:04Z

You are right. I didn't properly install CALVIN. However, the packages uesd in CALVIN and CLOVER seem contradictory. Can you provide a requirements.txt?

There is a provided requirements.txt at visual_planner/requirements.txt. What packages conflicts are you getting exactly?

Now I met the problem of "Cannot load URDF file" again. And the packages conflicts are listed below. Can you give me more advice? Thanks for your help!

retsuh-bqw · 2024-10-31T05:54:28Z

Now I met the problem of "Cannot load URDF file" again. And the packages conflicts are listed below. Can you give me more advice? Thanks for your help!

You can try to downgrade your networkx to 2.2. I think the other packages are fine.

hkz103 · 2024-10-31T06:06:50Z

Now I met the problem of "Cannot load URDF file" again. And the packages conflicts are listed below. Can you give me more advice? Thanks for your help!

You can try to downgrade your networkx to 2.2. I think the other packages are fine.

When using networkx2.2，AttributeError"module 'numpy' has no attribute 'int'." is reported, because the high version of numpy no longer uses int and networkx2.2 may use int in numpy.

retsuh-bqw · 2024-10-31T12:55:46Z

When using networkx2.2，AttributeError"module 'numpy' has no attribute 'int'." is reported, because the high version of numpy no longer uses int and networkx2.2 may use int in numpy.

You may try to downgrade the numpy as well. I'll update relavant information in a new Troubleshooting section.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision encoder output dimension does not match #1

Vision encoder output dimension does not match #1

yuaoze commented Oct 18, 2024

retsuh-bqw commented Oct 18, 2024

yuaoze commented Oct 21, 2024

yuaoze commented Oct 21, 2024

retsuh-bqw commented Oct 21, 2024

hkz103 commented Oct 28, 2024

retsuh-bqw commented Oct 29, 2024

hkz103 commented Oct 30, 2024

retsuh-bqw commented Oct 30, 2024

gouyinghong commented Oct 30, 2024

hkz103 commented Oct 30, 2024

retsuh-bqw commented Oct 30, 2024

hkz103 commented Oct 31, 2024

retsuh-bqw commented Oct 31, 2024

hkz103 commented Oct 31, 2024

retsuh-bqw commented Oct 31, 2024

Vision encoder output dimension does not match #1

Vision encoder output dimension does not match #1

Comments

yuaoze commented Oct 18, 2024

retsuh-bqw commented Oct 18, 2024

yuaoze commented Oct 21, 2024

yuaoze commented Oct 21, 2024

retsuh-bqw commented Oct 21, 2024

hkz103 commented Oct 28, 2024

retsuh-bqw commented Oct 29, 2024

hkz103 commented Oct 30, 2024

retsuh-bqw commented Oct 30, 2024

gouyinghong commented Oct 30, 2024

hkz103 commented Oct 30, 2024

retsuh-bqw commented Oct 30, 2024

hkz103 commented Oct 31, 2024

retsuh-bqw commented Oct 31, 2024

hkz103 commented Oct 31, 2024

retsuh-bqw commented Oct 31, 2024