Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System Identification #60

Open
offjangir opened this issue Jan 1, 2025 · 5 comments
Open

System Identification #60

offjangir opened this issue Jan 1, 2025 · 5 comments

Comments

@offjangir
Copy link

I have been trying to run the system identification scripts that have been provided. They do run good but simulated annealing takes too much time for me to complete. Currently, the code is set for 2000 iterations. Did you run it for 2000 iterations? how much time did it take to complete? what were the host system specifications that you ran it on.

@xuanlinli17
Copy link
Collaborator

xuanlinli17 commented Jan 2, 2025

I typically run it for 400 iterations and initialize 3 rounds of sysid using the best result from the prior iteration. 2000 is just for illustrative purposes in case you want to let it run overnight and wait till the next day to see the result

I use a machine w/ 32-cpu thread. 3 rounds in total (or 1200 iterations) take probably 6-7 hours (it only uses 18 threads on the machine as initializing >18 envs on a single 11G GPU will go OOM for ManiSkill2). But you only need to do it once per robot. I was thinking about implementing finite difference-based method which would be faster but didn't have a chance to do it.

@offjangir
Copy link
Author

Hi @xuanlinli17 I am running the direct example code. The code takes about 6 hours just for a 200 Iterations. I have been trying to do this multiple times. I am running this on a 48 core AMD processor with 80GB A100. Are you sure this is the latest code you are using ?

@xuanlinli17
Copy link
Collaborator

xuanlinli17 commented Jan 3, 2025

This is quite strange since multiprocessing is already automatically handled; but since you have A100 you can def use larger pool than 18: https://github.com/simpler-env/SimplerEnv/blob/main/tools/sysid/sysid.py#L151

Also how many trajectories do you have? If there are too many trajectories it will be proportionally slower. I used 18 trajectories .

@offjangir
Copy link
Author

I think this is a vulkan issue ? Maybe my processing is defaulting to CPU?

@xuanlinli17
Copy link
Collaborator

Is your gpu utilization >0? you can try "time.time()" to benchmark time elapsed for stepping the environment. The simulated annealing is on cpu though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants