After the great success of gym-gazebo, the AI team of Acutronic Robotics advances their Reinforcement Learning methods to be applicable in real tasks. Open code available in Github.

Quick demonstration of a converged policy using ROS2Learn framework and the gym-gazebo2 toolkit. We execute a deterministic run and also use settings that replicate a real behavior of the robot.


The first gym-gazebo was a successful proof of concept, which is being used by multiple research laboratories and many users of the robotics community. Given the positive impact of it, specially regarding usability, researchers at Acutronic Robotics have now freshly launched gym-gazebo2, the ROS 2 Reinforcement Learning toolkit.

“This is the logical evolution towards our initial goal: to bring Reinforcement Learning methods into robotics at a professional and industrial level”, says Risto Kojcev, Head of AI at Acutronic Robotics.

The AI team he leads researches on how Reinforcement Learning can be used instead of traditional path planning techniques.

“We aim to train behaviours that can be applied in complex dynamic environments, which resemble the new demands of agile production and human robot collaboration scenarios”.

Achieving this would lead to faster and easier development of robotic applications and moving the Reinforcement Learning techniques from a research setting to a production environment. gym-gazebo2 is a step forward in this long term goal.

The paper, made available here, presents an upgraded, real world application oriented version of gym-gazebo, the ROS and Gazebo based Reinforcement Learning (RL) toolkit, which complies with OpenAI’s Gym.

Start training and visualize the simulation without going through the step by step installation process. In this video we execute a simple test example and visualize it from our main OS. Gazebo must be already installed there, Ubuntu 18 in our case.


The text discusses the new ROS 2 based software architecture and summarizes the results obtained using Proximal Policy Optimization (PPO). Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions.

The team has focused on MARA, a modular robotic arm that is natively running ROS 2 in each of its modules. They have evaluated four different environments with different levels of complexity of MARA, reaching accuracies in the millimeter scale. The environments are MARA, MARA Orient, MARA Collision and MARA Collision Orient.

“We have focused on MARA first for being this modular robot arm the most direct option of transferring policies learned in gym-gazebo2 to the real world, hopefully industrial applications”

The converged results show the feasibility and usefulness of the gym-gazebo2 toolkit, its potential and applicability in industrial use cases, using modular robots.

More resources: