Skip to content

Latest commit

 

History

History
100 lines (62 loc) · 5.07 KB

README.md

File metadata and controls

100 lines (62 loc) · 5.07 KB

ROS2 Transformers

ROS2 package for deploying and fine-tuning multi-modal generalist agent models. This package provides inference servers as ROS2 action servers for the most popular generalist multimodal robotics models (see Available Models. It depends on robo_transformers to access and run inference for these models. Currently it also depends on dgl_ros for the agent servers but a refactor is needed.

Table of Contents

Tested Platforms

  • ✅ Ubuntu 22.04 + ROS2 Humble
Model Type Variants Observation Space Action Space Author
RT-1 rt1main, rt1multirobot, rt1simreal text + head camera end effector pose delta Google Research, 2022
RT-1-X rt1x text + head camera end effector pose delta Google Research et al., 2023
Octo octo-base, octo-small text + head camera + Optional[wrist camera] end effector pose delta Octo Model Team et al., 2023

Installation

Follow installation for dgl_ros.

Clone this repo in the src directory of your ROS workspace:

git clone https://github.com/sebbyjp/ros2_transformers.git

Install ROS dependencies:

rosdep install --from-paths src/ros2_transformers --ignore-src --rosdistro ${ROS_DISTRO} -y

Build:

colcon build --symlink-install --base-paths src/ros2_transformers --cmake-args -DCMAKE_BUILD_TYPE=RelWithDebInfo

Source:

source install/setup.bash

Install robo_transformers:

python3 -m pip install robo-transformers

Usage

In a terminal run the demo app: ros2 launch ros2_transformers task_launch.py. You can change the task with the task_name parameter in config/app_config.yaml. You can change what objects are spawned for a task with its yaml file in tasks/.

In another terminal run one of the following:

  • Octo: ros2 run ros2_transformers octo --ros-args -p use_sim_time:=true -p src_topic0:=$YOUR_MAIN_CAMERA_TOPIC -p src_topic1:=$YOUR_WRIST_CAMERA_TOPIC -p action_topic:=vla -p model_type:=octo -p weights_key:=octo-small -p default_instruction:="pick up the coke can off the table"

  • RT-1/X: ros2 run ros2_transformers rt1 --ros-args -p use_sim_time:=true -p src_topic0:=$YOUR_MAIN_CAMERA_TOPIC -p action_topic:=vla -p model_type:=rt1 -p weights_key:=rt1x -p default_instruction:="pick coke can"

Repo Structure

  • config: Contains configuration files for the application, ros_gz bridge, and rviz gui.
  • moveit: Contains MoveIt configuration files that are not specific to a robot.
  • robots: Contains robot specific files such as urdfs, meshes, ros_controllers, and robot-specific moveit configurations.
  • tasks: Contains task specifications (location and properties of objects in the environment) in yaml format.
  • sim: Contains simulation assets for tasks and gazebo world files.

Software Stack

1. Application Layer (your code)

  • See src/demo_app.cpp and launch/task_launch.py for an example of how to use this package.

2. ROS Agent Layer (called by your code)

  • Agent Inference Server (C++ or Python) (See dgl_ros).
  • See include/rt1.cpp and include/octo.cpp

2. AI Agent Layer

  • Tensorflow or PyTorch for inference and training (Python) (See robo_transformers)
  • ONNX or OpenVino for high performance inference and training (C++)

The following layers are usually part of the user’s robot stack but we include them to support robots out-of-the-box for users who only have actuator driver or ROS control API’s from the manufacturer (Note that ROS control makes calls to the driver APIs). These layers are only required at all because current foundational models for robotics output to action spaces like position or velocity of a robots arms and feet. Once foundational models begin outputting to input spaces for each of these layers (first joint angles then motor torques), they become redundant.

3. Kinematics Layer

  • MoveIt Inverse Kinematics (C++)
  • Open Motion Planning Library (C++)

4. Controller Layer

  • ROS2 control (C++)