Hierarchical reinforcement learning methods allow the robot to learn to perform a certain task in the level of macro-actions, a set of individual actions reducing the search space. This makes the learning process faster and more scalable, and allows the robot to generalize across unseen tasks or environments.
Source and extended article: “Hierarchical learning for modular robots” by Risto Kojcev, Nora Etxezarreta, Alejandro Hernandez and Víctor Mayoral
When performing a complex action, humans do not think or act in the level of granular primitive actions at the individual muscle or joint level. Instead, humans decompose complicated actions in a set of simpler actions.
By combining simpler actions or motor primitives, humans can learn more complicated and unseen challenges in a fast and easy way. Moreover, human cognition separates a task at several levels of temporal abstraction.
In robotics, the same occurs. Complicated tasks are composed of sub-tasks at different levels of granularity ranging from motor primitives to higher level tasks, such as grasping, where different time scales interact.
Most deep reinforcement learning (DRL) techniques focus on individual actions at single time steps resulting in
- low sample efficiency when training robots,
- lack of adaptability to unseen new tasks and
- low transfer capabilities between related tasks.
But, in order to develop robots that learn in an efficient and structured manner, temporally-extended actions and temporal abstraction are required.
Hierarchical reinforcement learning methods allow the robot to learn to perform a certain task in the level of macro-actions, a set of individual actions reducing the search space.
This makes the learning process faster and more scalable, and allows the robot to generalize across unseen tasks or environments.
We aimed to train different behaviours on a reconfigurable modular robot and evaluated the meta-learning shared hierarchies (MLSH) method and its applicability to modular robots.
In our experiments, we trained a master policy and corresponding sub-policies that generalize across different robot configurations and target positions by switching the robot configuration and corresponding target position.
The experimental setup consisted of a modular 3DoF robot to be extended by 1 DoF, both in simulation and in the real robot, and two target positions.
After training the network in simulation, we evaluated the learned MLSH network on the modular robot where the different target positions were reached for the different robot configurations.
You can find the experimental set up and results in the original paper: “Hierarchical learning for modular robots”
Interested in knowing more about our research on AI? Find our most recent research papers here.