[Docs] update readme info about the many baselines we have and vlas

haosulab · Dec 19, 2024 · be920fd · be920fd
1 parent 79c0bc5
commit be920fd
Show file tree

Hide file tree

Showing 4 changed files with 19 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -17,8 +17,8 @@ ManiSkill is a powerful unified framework for robot simulation and training powe
 - Example tasks cover a wide range of different robot embodiments (humanoids, mobile manipulators, single-arm robots) as well as a wide range of different tasks (table-top, drawing/cleaning, dextrous manipulation)
 - Flexible and simple task building API that abstracts away much of the complex GPU memory management code via an object oriented design
 - Real2sim environments for scalably evaluating real-world policies 100x faster via GPU simulation.
+- Many tuned robot learning baselines in Reinforcement Learning (e.g. PPO, SAC, [TD-MPC2](https://github.com/nicklashansen/tdmpc2)), Imitation Learning (e.g. Behavior Cloning, [Diffusion Policy](https://github.com/real-stanford/diffusion_policy)), and large Vision Language Action (VLA) models (e.g. [Octo](https://github.com/octo-models/octo), [RDT-1B](https://github.com/thu-ml/RoboticsDiffusionTransformer), [RT-x](https://robotics-transformer-x.github.io/))
 
-<!-- TODO replace paper link with arxiv link when it is out -->
 For more details we encourage you to take a look at our [paper](https://arxiv.org/abs/2410.00425).
 
 There are more features to be added to ManiSkill 3, see [our roadmap](https://maniskill.readthedocs.io/en/latest/roadmap/index.html) for planned features that will be added over time before the official v3 is released.

diff --git a/docs/source/index.md b/docs/source/index.md
@@ -12,12 +12,12 @@
 ManiSkill is a powerful unified framework for robot simulation and training powered by [SAPIEN](https://sapien.ucsd.edu/), with a strong focus on manipulation skills. The entire tech stack is as open-source as possible and ManiSkill v3 is in beta release now. Among its features include:
 - GPU parallelized visual data collection system. On the high end you can collect RGBD + Segmentation data at 30,000+ FPS with a 4090 GPU, 10-1000x faster compared to most other simulators.
 - GPU parallelized simulation, enabling high throughput state-based synthetic data collection in simulation
-- GPU parallelized heteogeneous simuluation, where every parallel environment has a completely different scene/set of objects
+- GPU parallelized heterogeneous simulation, where every parallel environment has a completely different scene/set of objects
 - Example tasks cover a wide range of different robot embodiments (humanoids, mobile manipulators, single-arm robots) as well as a wide range of different tasks (table-top, drawing/cleaning, dextrous manipulation)
 - Flexible and simple task building API that abstracts away much of the complex GPU memory management code via an object oriented design
-- Real2sim environments for scalably evaluating real-world policies 60-100x faster via GPU simulation.
+- Real2sim environments for scalably evaluating real-world policies 100x faster via GPU simulation.
+- Many tuned robot learning baselines in Reinforcement Learning (e.g. PPO, SAC, [TD-MPC2](https://github.com/nicklashansen/tdmpc2)), Imitation Learning (e.g. Behavior Cloning, [Diffusion Policy](https://github.com/real-stanford/diffusion_policy)), and large Vision Language Action (VLA) models (e.g. [Octo](https://github.com/octo-models/octo), [RDT-1B](https://github.com/thu-ml/RoboticsDiffusionTransformer), [RT-x](https://robotics-transformer-x.github.io/))
 
-<!-- TODO replace paper link with arxiv link when it is out -->
 For more details we encourage you to take a look at our [paper](https://arxiv.org/abs/2410.00425).
 
 There are more features to be added to ManiSkill 3, see [our roadmap](https://maniskill.readthedocs.io/en/latest/roadmap/index.html) for planned features that will be added over time before the official v3 is released.

diff --git a/docs/source/user_guide/index.md b/docs/source/user_guide/index.md
@@ -12,12 +12,12 @@
 ManiSkill is a powerful unified framework for robot simulation and training powered by [SAPIEN](https://sapien.ucsd.edu/), with a strong focus on manipulation skills. The entire tech stack is as open-source as possible and ManiSkill v3 is in beta release now. Among its features include:
 - GPU parallelized visual data collection system. On the high end you can collect RGBD + Segmentation data at 30,000+ FPS with a 4090 GPU, 10-1000x faster compared to most other simulators.
 - GPU parallelized simulation, enabling high throughput state-based synthetic data collection in simulation
-- GPU parallelized heteogeneous simuluation, where every parallel environment has a completely different scene/set of objects
+- GPU parallelized heterogeneous simulation, where every parallel environment has a completely different scene/set of objects
 - Example tasks cover a wide range of different robot embodiments (humanoids, mobile manipulators, single-arm robots) as well as a wide range of different tasks (table-top, drawing/cleaning, dextrous manipulation)
 - Flexible and simple task building API that abstracts away much of the complex GPU memory management code via an object oriented design
-- Real2sim environments for scalably evaluating real-world policies 60-100x faster via GPU simulation.
+- Real2sim environments for scalably evaluating real-world policies 100x faster via GPU simulation.
+- Many tuned robot learning baselines in Reinforcement Learning (e.g. PPO, SAC, [TD-MPC2](https://github.com/nicklashansen/tdmpc2)), Imitation Learning (e.g. Behavior Cloning, [Diffusion Policy](https://github.com/real-stanford/diffusion_policy)), and large Vision Language Action (VLA) models (e.g. [Octo](https://github.com/octo-models/octo), [RDT-1B](https://github.com/thu-ml/RoboticsDiffusionTransformer), [RT-x](https://robotics-transformer-x.github.io/))
 
-<!-- TODO replace paper link with arxiv link when it is out -->
 For more details we encourage you to take a look at our [paper](https://github.com/haosulab/ManiSkill/blob/main/figures/maniskill3_paper.pdf).
 
 There are more features to be added to ManiSkill 3, see [our roadmap](https://maniskill.readthedocs.io/en/latest/roadmap/index.html) for planned features that will be added over time before the official v3 is released.
@@ -46,6 +46,7 @@ datasets/index
 data_collection/index
 reinforcement_learning/index
 learning_from_demos/index
+vision_language_action_models/index
 wrappers/index
 ```
 

diff --git a/docs/source/user_guide/vision_language_action_models/index.md b/docs/source/user_guide/vision_language_action_models/index.md
@@ -0,0 +1,11 @@
+# Vision Language Action Models
+
+ManiSkill supports evaluating and pretraining vision language action models. Currently the following VLAs have been tested via the ManiSkill framework:
+
+- [Octo](https://github.com/octo-models/octo)
+- [RDT-1B](https://github.com/thu-ml/RoboticsDiffusionTransformer)
+- [RT-x](https://robotics-transformer-x.github.io/)
+
+RDT-1B uses some of the ManiSkill demonstrations for pretraining data and evaluates by fine-tuning on some demonstrations on various ManiSkill tasks, see their [README](#https://github.com/thu-ml/RoboticsDiffusionTransformer?tab=readme-ov-file#simulation-benchmark) for more details.
+
+Octo and RT series of models are evaluated through various real2sim environments as part of the SIMPLER project, see their [README](https://github.com/simpler-env/SimplerEnv/tree/maniskill3) for details on how to run the evaluation setup.