Part A

Methods for robot skill learning in human-centered environments through imitation
and reinforcement learning

WP1: Active exploration in learning by demonstration

WP1 focuses on an innovative approach where robots learn new skills by observing and imitating user demonstrations rather than explicit programming. Using advanced methods like Deep Inverse Q-Learning (IQL), robots can infer the user’s intentions from limited and targeted demonstrations, improving learning efficiency significantly. The work package emphasizes active learning, where the robot identifies uncertainties in its knowledge and asks for additional demonstrations to optimally improve its performance. This approach enhances transparency and trust by making the robot’s behavior explainable and adaptable to different user inputs.

Recent Highlights from WP1

In WP1 (Learning from demonstrations), the team advanced their reinforcement learning (RL) approach by shifting from offline to a combined offline-online strategy. This approach drastically reduces the need for expert demonstrations from around 1000 to just 1–10, addressing challenges such as extrapolation error, catastrophic forgetting, and primacy bias. Their current research focuses on density estimation and uncertainty quantification using techniques like Gaussian Processes, mixture models, and diffusion models, with valuable links to WP6.

WP2: Continual Learning for Assistive Robots

WP2 addresses challenges in enabling robots to continuously learn and adapt to new tasks and environments while efficiently retaining previously acquired knowledge. This work package develops methods that allow robots to safely and efficiently acquire, transfer and update knowledge in complex real-world settings, such as household environments. It also targets to mitigate unsafe behavior changes and ensuring privacy in data handling. WP2 collaborates with other WPs of the project to create scalable, secure, and ethically sound autonomous systems capable of lifelong learning and improvement.

Recent Highlights from WP2

WP2 (Continual Learning) aims at enabling robots to learn new tasks in an incremental fashion without forgetting previous tasks. In its recent efforts the team aims to facilitate imitation learning from a single human demonstration, supporting zero-shot transfer to real-world tasks. The teams efforts also focus on spatial local policies for generalization across diverse spatial, visual, and task contexts, as well as leveraging unstructured “Play Data” and test-time steering for generative policies. They further investigate the use of efficient fine-tuning methods like LoRA in robot manipulation.

CARTO:
Category and Joint Agnostic Reconstruction of ARTiculated Objects.

The team built a pipeline that estimates the pose and 3D reconstruction of articulated objects. This helps to facilitate generating training data for a robot to learn from.

Find out more

DITTO:
Demonstration Imitation by Trajectory Transformation.

The team proposes a method that allows a human to demonstrate a task in their own embodiment and then transferred to a robot without any further fine-tuning or training.

Find out more

Real2Gen:
Imitation Learning from a Single Human Demonstration with Generative Foundational Models.

The team constructed a pipeline that uses the same demonstrations as in DITTO and combines them with generative simulation to generate an unlimited of training data for a robot manipulation agent.

Test-Time Steering for Generative Policies. In collaboration with WP3, the team investigated how human play data (i.e. actions without a specified goal but still meaningful) can be used to train a general agent, and only steer it through human interaction at test time.

Continue to Part B