A fundamental challenge in teaching robots is to provide an effective interface for human teachers to demonstrate useful skills to a robot. This challenge is exacerbated in dexterous manipulation, where teaching high-dimensional, contact-rich behaviors often require esoteric teleoperation tools. In this work, we present Holo-Dex, a framework for dexterous manipulation that places a teacher in an immersive mixed reality through commodity VR headsets. The high-fidelity hand pose estimator onboard the headset is used to teleoperate the robot and collect demonstrations for a variety of general purpose dexterous tasks. Given these demonstrations, we use powerful feature learning combined with non-parametric imitation to train dexterous skills. Our experiments on six common dexterous tasks, including in-hand rotation, spinning, and bottle opening, indicate that Holo-Dex can both collect high-quality demonstration data and train skills in a matter of hours. Finally, we find that our trained skills can exhibit generalization on objects not seen in training.
Holo-Dex consists of two phases: demonstration collection, which is performed in real-time with visual feedback through the Mixed Reality Interface, and demonstration-based policy learning, which can learn to solve dexterous tasks from a limited number of demonstrations.
We collect demonstrations in real-time from multiple RGB-D cameras. We use the VR Headset's hand detector and retarget the human fingertip positions for the Allegro Hand. The low-level controller uses Inverse Kinematics and Joint retargeting along with PD Control to reach the desired locations in 3D space.
We use non-parametric imitation learning (Nearest Neighbors) to optimize our robot policies. At each timestep, the closest example to the current observation is searched in the demonstration dataset. The action associated with that closest example is then executed on the robot.
We observed that imitation learning model was able to generalize on structurally and visually different objects in all the in-hand manipulation tasks.
We also observe cases where the policy fails to generalize due to the object being very thin or small in the Planar Rotation and Can Spinning tasks.