In this research paper we present a framework -with Gazebo and ROS- that simplifies the process of building modular robots and their corresponding tools. It also includes baseline implementations for the most common DRL techniques dealing with policy iteration methods.

All the training for the 3DoF (illustrated on the left) and 4DoF (illustrated on the right) Scara robot is performed in simulation in our environment. Then, the trained network is transferred to the real robot.

Source and extended article: “Evaluation of deep reinforcement learning methods for modular robots” by Risto Kojcev, Nora Etxezarreta, Alejandro Hernandez and Víctor Mayoral

Current robot systems are designed, built and programmed by teams with multidisciplinary skills. The traditional approach to program such systems is typically referred to as the robotics control pipeline and requires going from observations to final low-level control commands through:

State estimation -> modeling and prediction -> planning -> low level control translation.

Every step in the pipeline requires fine tuning, leading to a relevant complexity.

In recent years, several techniques for DRL (Deep Reinforcement Learning) have shown good success in learning complex behavior skills and solving challenging control tasks in high-dimensional state-space.

Modular robots can be extend seamlessly through modular components. This brings advantages for their construction, but training them with current DRL methods becomes cumbersome as:

  • Every small change in the physical structure of the robot will require a new training.
  • Building the tools to train modular robots is a time consuming process.
  • Transferring the results to the real robot is complex given the flexibility of these systems.

In this research paper we present a framework -with Gazebo and ROS- that simplifies the process of building modular robots and their corresponding tools. It also includes baseline implementations for the most common DRL techniques dealing with policy iteration methods.

Using this framework, we present configurations with 3 and 4 degrees-of-freedom (DoF), while performing the same task.

At Acutronic Robotics, we trained two modular robots, namely the SCARA 3DoF and 4DoF robots, where the Gazebo simulator and corresponding ROS packages convert the actions generated from each algorithm to appropriate trajectories the robot can execute.

You can find the experimental results here, in the original paper: Evaluation of deep reinforcement learning methods for modular robots”.

Mean Episode reward for training the 3DoF Scara robot with PPO1 (top left) and PPO2 (top right) and training the 4DoF Scara robot with PPO1 (bottom left) and PPO2 (bottom right) when executing trajectories with different times.

We know there still remain many challenges within the DRL field for robotics. Such as:

  • the long training times
  • the simulation-to-real robot transfer
  • reward shaping sample efficiency, or
  • extending the behavior to diverse tasks and robot configurations.

So far, our work with modular robots has focused on simple tasks like reaching a point in space. In order to have an end-to-end training framework (from pixels to motor torques) and to perform more complex tasks, we aim to integrate additional rich sensory input, such as vision.

We envision the future of robotics to be about modular robots where the trained network can generalize online to modifications in the robot, such as change of a component or dynamic obstacle avoidance.

Interested in knowing more about our research on AI? Check our most recent research articles here.