In-the-Wild Compliant Manipulation with UMI-FT

1Stanford University      * Equal Contribution.

UMI-FT allows multi-modal imitation learning using a modified adaptive compliance policy.

In-the-wild multimodal data enables generalization to unseen scenes and clutter.


Abstract

Many manipulation tasks require careful force modulation. With insufficient force the task may fail, while excessive force could cause damage. The high cost, bulky size and fragility of commercial force/torque (F/T) sensors have limited large-scale, force-aware policy learning. We introduce UMI-FT, a handheld data-collection platform that mounts compact, six-axis force/torque sensors on each finger, enabling finger-level wrench measurements alongside RGB, depth, and pose. Using the multimodal data collected from this device, we train an adaptive compliance policy that predicts position targets, grasp force, and stiffness for execution on standard compliance controllers. In evaluations on three contact-rich, force-sensitive tasks (whiteboard wiping, skewering zucchini, and lightbulb insertion), UMI-FT enables policies that reliably regulate external contact forces and internal grasp forces, outperforming baselines that lack compliance or force sensing. UMI-FT offers a scalable path to learning compliant manipulation from in-the-wild demonstrations.


Video


UMI-FT System Design

UMI-FT consists of an iPhone that provdies RGB vision, ultrawide RGB, depth, and pose using the ARKit. Each finger is sensorized using a CoinFT to capture per-finger wrench information during manipulation. The multimodal data is used to train a modified adaptive compliance policy, blending diffusion policy with a low-level compliance controller.


Evaluations

(1) Whiteboard Wiping 🧽

Task The robot should wipe the whitebaord until it's clean; locate the eraser, approach and grasp the eraser, retract and locate markings on the whiteboard, wipe until no markings remain.

Comparisons [No compliance] policy struggles to modulate contact force and often performs an incomplete wipe or triggers a safety fault due to excessive contact force. [No force] policy also fails to grasp unseen narrower erasers. [Contact mic] policy often triggers a safety fault due to excessive contact force.

Our Method

Baselines

Task 2: Skewer Zucchini 🥒

Task The robot should skewer a zucchini on a stick; grasp the zucchini slice firmly, and push it onto a stick until punctured.

Comparisons [No compliance] performance similar to our method due to the compliance of the zucchini itself. [No force] policy fails to resist the reaction force between the stick and the zucchini. The zucchini often slips out of grasp.

Our Method

Baselines

Task 2-2: Skewer Zucchini In-the-Wild 🥒

In-the-Wild data with large scene variation enables generalization to unseen environments with clutter.

Our Method with In-the-Wild Data

Our Method with In-Lab Data

Task 3: Lightbulb Insertion 💡

Task The robot should insert a lightbulb in a socket; grasp the bulb firmly, approach the socket, make gentle contact and rotate the bulb until the bayonet pin is aligned with the slit on the socket, insert while overcoming the reaction force of the spring-loaded electrode, and rotate to light up the bulb.

Comparisons [No compliance] policy may fail to maintain contact of the bulb with the socket during haptic search. This can cause the bayonet pin to rotate past the slit. [No force] policy displays lower accuracy in motion, resulting in misalignment between the bulb and the socket. [Contact mic] policy failed due to excessive rotation of the bulb during haptic search or slippage of the bulb during insertion.

Our Method

Baselines