Part B
Generalization of skills across different tasks and environments using meta-learning
techniques
WP3: Federated learning in diverse environments
In WP3, we teach fleets of robots to share skills without sharing secrets. When robots work in private spaces like homes, they must learn from experience without exposing personal data. We solve this by stripping camera footage down to essential, anonymized features — relevant object poses and keypoints rather than faces and furniture in the background. This method protects privacy while helping robots learn faster and recognize objects in new environments. To handle complex jobs, we break long tasks into short, reusable skills that the robots can shuffle and recombine like building blocks. Finally, because every home is different, we give the robots tools to tweak their own behavior the moment they encounter a new obstacle.
Recent Highlights from WP3
Task Parametrized Gaussian Mixture Models (TP-GMM) offer a promising way for robots to learn by imitation. But taking this math from the simulator to the real world remains difficult. We solve three specific problems to make it work.
First, robot hands move in arcs, not straight lines. We capture this by modeling velocity on its natural curved surface rather than a flat grid. Second, we use these motion patterns to slice long tasks—like cooking a meal—into distinct skills, such as chopping or stirring. This allows the robot to shuffle and recombine these actions to solve entirely new problems. Third, we teach the robot to see what matters. When stirring, for instance, it automatically tracks the pot and ladle while ignoring the cutting board.
The results are compelling. Our robots learn complex tasks after seeing them performed only five times—one-twentieth the data usually required. But beyond speed, they achieve a flexibility that baselines cannot match: adapting these learned skills to completely new objects and changing environments.

Robots learn slowly because raw video data is overwhelming. To speed them up, we must boil down camera footage into simple, compact summaries. The problem is that current algorithms assume they can see the whole scene. But in the real world, objects are hidden by clutter or slip out of the camera’s frame. When a robot cannot see an object, or when the object moves close enough to look different, the robot usually loses track of it.
We introduce our approach, Bayesian Scene Keypoints (BASK) to solve this puzzle. It is a probabilistic method that tracks specific points on an object —like the handle of cup— regardless of how far away they are. BASK resolves the confusion caused by missing information; it knows where a tool is even when it is hidden, and it can track a symmetrical cup that looks the same from different angles. We tested this using a camera mounted on the robot’s moving wrist. The system mastered difficult tasks involving multiple objects, outperforming standard techniques. It proved stubborn in the face of messy desks, blocked views, and limited field-of-view, even handling objects it had never seen before.

WP4: Meta-learning with meta-features and augmentation
WP4 focuses on developing methods to enable accurate transfer of robotic policies in real-world household environments through advanced meta-learning techniques. The goal is to make learned behaviors adapt quickly and safely to new homes, object arrangements, and interaction dynamics without requiring extensive retraining or perfectly curated demonstrations. To achieve this, WP4 investigates meta-learning across diverse simulated environments. A key innovation is the development of novel end-to-end methods to compute environment meta-features that enhance policy transfer efficiency. Additionally, WP4 explores the generation of synthetic, varied simulated environments to facilitate faster and more reliable transfer from simulation to real-world settings. This work is crucial for enabling scalable, adaptable, and safe assistive robots that can operate effectively in dynamic household environments.
Recent Highlights from WP4
WP4 worked on meta-learning for task sequencing by creating synthetic graph-based sequencing environments, defining utility scores for (partial) sequences, and successfully meta-learning a Transformer-based sequencing policy that can choose good next tasks even from suboptimal context. In parallel, they improved cross-episode Meta-RL efficiency by shifting the outer loop from on-policy (PPO) to off-policy (SAC), achieving ~2.5× faster learning on ML1 Reach and SOTA performance on ML1 Push from the MetaWorld benchmark, while noting memory limits as task diversity grows (replay buffers become a bottleneck). Next steps include extending the utility function to enable mid-sequence recovery and exploring how these sequencing strategies can support WP1 with less reliance on optimal trajectories.


WP5: Meta-learning with dynamic algorithm configuration for reinforcement learning
The WP5 focuses on enhancing learning efficiency by enabling dynamic configuration of learning agents, allowing their learning parameters to be adapted on-the-fly to the task at hand. This is essential because deep reinforcement learning (RL) algorithms, which often underpin robot learning, are highly sensitive to their configurations. Unlike many other machine learning settings, the data from which we learn changes continuously during the learning process. These changes arise both from learning to solve new tasks and from exploring different behaviors within the same task. Consequently, different stages of learning require different algorithm configurations to achieve optimal results. If this adaptation is not carefully managed, it can severely hinder or even prevent successful learning.
To enable efficient, scalable, and robust learning, we first focus on identifying suitable meta-features that facilitate transfer of configuration policies across varying problems and environments. Building on this, we aim to develop dynamic configuration policies that optimize reinforcement learning efficiency. The ultimate goal of this work package is to create dynamic configuration policies that are transferable even across vastly different problem environments.
Recent Highlights from WP5
WP5 (Dynamic Algorithm Configuration for RL) explored the effects of hyperparameter optimization on RL. Showing work on both hand-designed and inferred meta-features of environments, Dynamic Algorithm Configuration boosts adaptability and meta-learning in such parameterized environment settings with similar dynamics. Their current focus is on improving zero-shot generalization to novel environments with differing dynamics.
