Skip to content

Commit

Permalink
[Docs] update readme info about the many baselines we have and vlas
Browse files Browse the repository at this point in the history
  • Loading branch information
StoneT2000 committed Dec 19, 2024
1 parent 79c0bc5 commit be920fd
Show file tree
Hide file tree
Showing 4 changed files with 19 additions and 7 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ ManiSkill is a powerful unified framework for robot simulation and training powe
- Example tasks cover a wide range of different robot embodiments (humanoids, mobile manipulators, single-arm robots) as well as a wide range of different tasks (table-top, drawing/cleaning, dextrous manipulation)
- Flexible and simple task building API that abstracts away much of the complex GPU memory management code via an object oriented design
- Real2sim environments for scalably evaluating real-world policies 100x faster via GPU simulation.
- Many tuned robot learning baselines in Reinforcement Learning (e.g. PPO, SAC, [TD-MPC2](https://github.com/nicklashansen/tdmpc2)), Imitation Learning (e.g. Behavior Cloning, [Diffusion Policy](https://github.com/real-stanford/diffusion_policy)), and large Vision Language Action (VLA) models (e.g. [Octo](https://github.com/octo-models/octo), [RDT-1B](https://github.com/thu-ml/RoboticsDiffusionTransformer), [RT-x](https://robotics-transformer-x.github.io/))

<!-- TODO replace paper link with arxiv link when it is out -->
For more details we encourage you to take a look at our [paper](https://arxiv.org/abs/2410.00425).

There are more features to be added to ManiSkill 3, see [our roadmap](https://maniskill.readthedocs.io/en/latest/roadmap/index.html) for planned features that will be added over time before the official v3 is released.
Expand Down
6 changes: 3 additions & 3 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@
ManiSkill is a powerful unified framework for robot simulation and training powered by [SAPIEN](https://sapien.ucsd.edu/), with a strong focus on manipulation skills. The entire tech stack is as open-source as possible and ManiSkill v3 is in beta release now. Among its features include:
- GPU parallelized visual data collection system. On the high end you can collect RGBD + Segmentation data at 30,000+ FPS with a 4090 GPU, 10-1000x faster compared to most other simulators.
- GPU parallelized simulation, enabling high throughput state-based synthetic data collection in simulation
- GPU parallelized heteogeneous simuluation, where every parallel environment has a completely different scene/set of objects
- GPU parallelized heterogeneous simulation, where every parallel environment has a completely different scene/set of objects
- Example tasks cover a wide range of different robot embodiments (humanoids, mobile manipulators, single-arm robots) as well as a wide range of different tasks (table-top, drawing/cleaning, dextrous manipulation)
- Flexible and simple task building API that abstracts away much of the complex GPU memory management code via an object oriented design
- Real2sim environments for scalably evaluating real-world policies 60-100x faster via GPU simulation.
- Real2sim environments for scalably evaluating real-world policies 100x faster via GPU simulation.
- Many tuned robot learning baselines in Reinforcement Learning (e.g. PPO, SAC, [TD-MPC2](https://github.com/nicklashansen/tdmpc2)), Imitation Learning (e.g. Behavior Cloning, [Diffusion Policy](https://github.com/real-stanford/diffusion_policy)), and large Vision Language Action (VLA) models (e.g. [Octo](https://github.com/octo-models/octo), [RDT-1B](https://github.com/thu-ml/RoboticsDiffusionTransformer), [RT-x](https://robotics-transformer-x.github.io/))

<!-- TODO replace paper link with arxiv link when it is out -->
For more details we encourage you to take a look at our [paper](https://arxiv.org/abs/2410.00425).

There are more features to be added to ManiSkill 3, see [our roadmap](https://maniskill.readthedocs.io/en/latest/roadmap/index.html) for planned features that will be added over time before the official v3 is released.
Expand Down
7 changes: 4 additions & 3 deletions docs/source/user_guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@
ManiSkill is a powerful unified framework for robot simulation and training powered by [SAPIEN](https://sapien.ucsd.edu/), with a strong focus on manipulation skills. The entire tech stack is as open-source as possible and ManiSkill v3 is in beta release now. Among its features include:
- GPU parallelized visual data collection system. On the high end you can collect RGBD + Segmentation data at 30,000+ FPS with a 4090 GPU, 10-1000x faster compared to most other simulators.
- GPU parallelized simulation, enabling high throughput state-based synthetic data collection in simulation
- GPU parallelized heteogeneous simuluation, where every parallel environment has a completely different scene/set of objects
- GPU parallelized heterogeneous simulation, where every parallel environment has a completely different scene/set of objects
- Example tasks cover a wide range of different robot embodiments (humanoids, mobile manipulators, single-arm robots) as well as a wide range of different tasks (table-top, drawing/cleaning, dextrous manipulation)
- Flexible and simple task building API that abstracts away much of the complex GPU memory management code via an object oriented design
- Real2sim environments for scalably evaluating real-world policies 60-100x faster via GPU simulation.
- Real2sim environments for scalably evaluating real-world policies 100x faster via GPU simulation.
- Many tuned robot learning baselines in Reinforcement Learning (e.g. PPO, SAC, [TD-MPC2](https://github.com/nicklashansen/tdmpc2)), Imitation Learning (e.g. Behavior Cloning, [Diffusion Policy](https://github.com/real-stanford/diffusion_policy)), and large Vision Language Action (VLA) models (e.g. [Octo](https://github.com/octo-models/octo), [RDT-1B](https://github.com/thu-ml/RoboticsDiffusionTransformer), [RT-x](https://robotics-transformer-x.github.io/))

<!-- TODO replace paper link with arxiv link when it is out -->
For more details we encourage you to take a look at our [paper](https://github.com/haosulab/ManiSkill/blob/main/figures/maniskill3_paper.pdf).

There are more features to be added to ManiSkill 3, see [our roadmap](https://maniskill.readthedocs.io/en/latest/roadmap/index.html) for planned features that will be added over time before the official v3 is released.
Expand Down Expand Up @@ -46,6 +46,7 @@ datasets/index
data_collection/index
reinforcement_learning/index
learning_from_demos/index
vision_language_action_models/index
wrappers/index
```

Expand Down
11 changes: 11 additions & 0 deletions docs/source/user_guide/vision_language_action_models/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Vision Language Action Models

ManiSkill supports evaluating and pretraining vision language action models. Currently the following VLAs have been tested via the ManiSkill framework:

- [Octo](https://github.com/octo-models/octo)
- [RDT-1B](https://github.com/thu-ml/RoboticsDiffusionTransformer)
- [RT-x](https://robotics-transformer-x.github.io/)

RDT-1B uses some of the ManiSkill demonstrations for pretraining data and evaluates by fine-tuning on some demonstrations on various ManiSkill tasks, see their [README](#https://github.com/thu-ml/RoboticsDiffusionTransformer?tab=readme-ov-file#simulation-benchmark) for more details.

Octo and RT series of models are evaluated through various real2sim environments as part of the SIMPLER project, see their [README](https://github.com/simpler-env/SimplerEnv/tree/maniskill3) for details on how to run the evaluation setup.

0 comments on commit be920fd

Please sign in to comment.