We introduce an extension of gym gazebo, called robot gym, that makes use of container technology to deploy experiments in a distributed way, accelerating the training process through a framework for distributed Reenforcement Learning.

Source and extended article: “Robot gym: accelerated robot training through simulation in the cloud with ROS and Gazebo Víctor Mayoral, Risto Kojcev, Nora Etxezarreta, Alejandro Hernandez and Irati Zamalloa

Rather than programming, training -Reinforcement Learning (RL)- allows robots to achieve behaviors that generalize better and are capable to respond to real-world needs with dynamic environments.

But training requires a big amount of experimentation which is not always feasible for a physical robot. Such approach is expensive, requires hundreds of thousands of attempts (performed by a group of robots) and a period of several months.

These capabilities are only available to a restricted few, thereby training in simulation has gained popularity. The idea behind using simulation is to train a virtual model of the real robot until the desired behavior is learned and then transfer the knowledge to the real robot.

The behavior can be further enhanced by exposing it to a restricted number of additional training iterations. In this work, we introduce an extension of gym gazebo, called robot gym.

Robot gym makes use of container technology to deploy experiments in a distributed way, accelerating the training process through a framework for distributed RL.

It is is a framework for deploying robotic experiments in distributed workers to reduce the time spent gathering experience from the environment and, overall, decrease the training time of robots when using RL.

We aimed to provide answers to the following questions: By how much is it possible to accelerate robot training time with RL techniques? And: What is the associated cost of doing so?

To validate our framework, we ran experimental tests in simulation and deployed the results both in simulated and on real robots.


We experimented with two modular robots: a 3 Degrees-of-Freedom (3 DoF) robot in a SCARA configuration and a 6 DoF modular articulated arm. We analyzed the impact of different number of workers, distributed through several machines and the number of iterations that the robot needs to converge. Our goal was to reach a specific position in the space stopping the training when the algorithm obtained zero as the target reward. Rewards were heuristically determined using the distance to the target point.

Our experimental results show a significant reduction of the training time. Compared to standard RL approaches, our framework, for simple tasks, accelerates the robot training time by more than 33% while maintaining similar levels of accuracy and repeatability.

Our experimental results show a significant reduction of the training time. Compared to standard RL approaches, our framework, for simple tasks, accelerates the robot training time by more than 33% while maintaining similar levels of accuracy and repeatability.

You can find the detailed experimental set up and results in the original paper: “Robot gym: accelerated robot training through simulation in the cloud with ROS and Gazebo".

Interested in knowing more about our research on AI? Find our most recent research papers here.