Holo-Dex: Teaching Dexterity with Immersive Mixed Reality

Abstract

A fundamental challenge in teaching robots is to provide an effective interface for human teachers to demonstrate useful skills to a robot. This challenge is exacerbated in dexterous manipulation, where teaching high-dimensional, contact-rich behaviors often require esoteric teleoperation tools. In this work, we present Holo-Dex, a framework for dexterous manipulation that places a teacher in an immersive mixed reality through commodity VR headsets. The high-fidelity hand pose estimator onboard the headset is used to teleoperate the robot and collect demonstrations for a variety of general purpose dexterous tasks. Given these demonstrations, we use powerful feature learning combined with non-parametric imitation to train dexterous skills. Our experiments on six common dexterous tasks, including in-hand rotation, spinning, and bottle opening, indicate that Holo-Dex can both collect high-quality demonstration data and train skills in a matter of hours. Finally, we find that our trained skills can exhibit generalization on objects not seen in training.

Method

Holo-Dex consists of two phases: demonstration collection, which is performed in real-time with visual feedback through the Mixed Reality Interface, and demonstration-based policy learning, which can learn to solve dexterous tasks from a limited number of demonstrations.

Demonstrations

We collect demonstrations in real-time from multiple RGB-D cameras. We use the VR Headset's hand detector and retarget the human fingertip positions for the Allegro Hand. The low-level controller uses Inverse Kinematics and Joint retargeting along with PD Control to reach the desired locations in 3D space.




Demonstration collection for object flipping task

Policies

We use non-parametric imitation learning (Nearest Neighbors) to optimize our robot policies. At each timestep, the closest example to the current observation is searched in the demonstration dataset. The action associated with that closest example is then executed on the robot.

Nearest Neighbors observed in the Planar Rotation task

Nearest Neighbors observed in the Object Flipping task

Nearest Neighbors observed in the Can Spinning task

Our learned policies on 6 different tasks




Planar Rotation

Object Flipping

Can Spinning

Bottle Opening

Card Sliding

Postit Note Sliding

Generalization Experiments

We observed that imitation learning model was able to generalize on structurally and visually different objects in all the in-hand manipulation tasks.

Trees
Mountains and houses
Sky and mountains
Sky and mountains
Sky and mountains
Snow on leafs

Generalization observed with unseen objects in Planar Rotation task


Trees
Mountains and houses
Sky and mountains
Sky and mountains
Sky and mountains
Snow on leafs
Snow on leafs
Snow on leafs
Snow on leafs

Generalization observed with unseen objects in Object Flipping Task


Trees
Mountains and houses
Sky and mountains
Sky and mountains
Sky and mountains
Snow on leafs
Snow on leafs
Snow on leafs
Snow on leafs

Generalization observed with unseen objects in Can Spinning Task

Failure Cases Observed

We also observe cases where the policy fails to generalize due to the object being very thin or small in the Planar Rotation and Can Spinning tasks.

Failure due to the object having a different color

Failure due to the object being too thin

Failure due to the object being too thin

Failure due to the object being small