Artificial Intelligence researchers at this robotics company propose a novel framework for Deep Reinforcement Learning (DRL) in modular robotics. Code in Github.

ROS2Learn, Deep Reinforcement Learning framework, provides an approach which trains a robot directly from joint states, with traditional robotic tools.

“We use a state-of-the-art implementation of the Proximal Policy Optimization, Trust Region Policy Optimization and Actor-Critic Kronecker-Factored Trust Region algorithms to learn policies in four different environments around MARA modular robotic arm.” explains Risto Kojcev, Head of AI.

In a paper made available today, the team of researchers describe baseline implementations for the most common Deep Reinforcement Learning (DRL) techniques for policy iteration methods. And using this framework they present the results obtained benchmarking Deep Reinforcement Learning methods in a modular robotic arm with 6 degrees-of-freedom (DoF).

In this tutorial you will learn to use the ROS2learn framework, which uses gym-gazebo2 to create OpenAi Gym compliant environments. The video shows the learning process of one of the MARA environments created for gym-gazebo2, MARAOrient. In this environment the is goal to learn a trajectory that approximates a point in the 3D space with a certain trajectory.

Using a Deep Reinforcement Learning framework that communicates with typical tools used in robotics, such as Gazebo and ROS 2 allows a more realistic representation of the environment. Moreover, they also compare the robustness of the performance of such methods in modular robots with an empirical study in robot simulation.

The results show that the proposed framework is stable during training of neural networks trough Deep Reinforcement Learning with policy-based methods.

This quick ROS2Learn tutorial introduces the concepts of transfer learning and multi-instance. We show how to resume a training from a saved checkpoint and we demonstrate the possibility of launching a new instance at the same time. This instance is a deterministic run, which uses a different driver than the training version.

More resources