3 | Daniel Ho

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could in principle be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a …

Bayesian Imitation Learning for End-to-End Mobile Manipulation

We take a Bayesian approach to imitation learning from multiple sensor inputs and apply to the task of opening office doors with a mobile manipulator. We show that using the Variational Information Bottleneck to regularize convolutional neural networks improves generalization to held-out domains, reduces the sim-to-real gap in a sensor-agnostic manner, and provides useful estimates of model uncertainty. In a real-world office environment, we achieve 96% task success.

Practical Imitation Learning in the Real World via Task Consistency Loss

Task Consistency Loss (TCL) is a self-supervised loss that encourages sim and real alignment both at the feature and action-prediction levels, building on top of RetinaGAN. We teach a mobile manipulator to autonomously approach a door, turn the handle to open the door, and enter the room. The imitation learning policy performs control from RGB and depth images and generalizes to doors not encountered in training data. We achieve 72% success across sixteen seen and unseen scenes using only ~16.2 hours of teleoperated demonstrations in sim and real.

SimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning

SimGAN tackles domain adaptation by identifying a hybrid physics simulator to match simulated trajectories to those in the target domain. It uses a learned discriminative loss to address the limitations associated with manual loss design. Our hybrid simulator combines neural networks and traditional physics simulaton to balance expressiveness and generalizability, and alleviates the need for a carefully selected parameter set in System ID.

COCOI: Contact-aware Online Context Inference for Generalizable Non-planar Pushing

Deep reinforcement learning (RL) has shown great potential in solving robot manipulation tasks. However, existing RL policies have limited adaptability to environments with diverse dynamics properties, which is pivotal in solving many contact-rich manipulation tasks. We propose Contact-aware Online COntext Inference (COCOI), a deep RL method that encodes a context embedding of dynamics properties online using contact-rich interactions. We study this method based on a novel and challenging non-planar pushing task.

RetinaGAN: An Object-aware Approach to Sim-to-Real Transfer

RetinaGAN is a generative adversarial network approach to adapt simulated images to realistic ones with object-detection consistency. Trained unsupervised without task loss dependencies, it preserves general object structure and texture in adapted images. On three real world tasks: grasping, pushing, and door opening, RetinaGAN improves performance for RL-based object instance grasping and continues to be effective even in the limited data regime.