In-the-Wild Compliant Manipulation with UMI-FT

¹Stanford University * Equal Contribution.

Abstract

Many manipulation tasks require careful force modulation. With insufficient force the task may fail, while excessive force could cause damage. The high cost, bulky size and fragility of commercial force/torque (F/T) sensors have limited large-scale, force-aware policy learning. We introduce UMI-FT, a handheld data-collection platform that mounts compact, six-axis force/torque sensors on each finger, enabling finger-level wrench measurements alongside RGB, depth, and pose. Using the multimodal data collected from this device, we train an adaptive compliance policy that predicts position targets, grasp force, and stiffness for execution on standard compliance controllers. In evaluations on three contact-rich, force-sensitive tasks (whiteboard wiping, skewering zucchini, and lightbulb insertion), UMI-FT enables policies that reliably regulate external contact forces and internal grasp forces, outperforming baselines that lack compliance or force sensing. UMI-FT offers a scalable path to learning compliant manipulation from in-the-wild demonstrations.

UMI-FT System Design

UMI-FT consists of an iPhone that provdies RGB vision, ultrawide RGB, depth, and pose using the ARKit. Each finger is sensorized using a CoinFT to capture per-finger wrench information during manipulation. The multimodal data is used to train a modified adaptive compliance policy, blending diffusion policy with a low-level compliance controller.

UMI-FT controller architecture. The same force/torque measurement is used in all three control loops. Proprioception is omitted in the figure for clarity. The learned policy runs the slowest and generates reference targets to the other two model-based controllers.

In particular, CoinFT reading is directly used in standard compliance/force controllers to provide delicate 6D compliance control and real-time force modulation.

(1) Whiteboard Wiping 🧽

Task The robot should wipe the whitebaord until it's clean; locate the eraser, approach and grasp the eraser, retract and locate markings on the whiteboard, wipe until no markings remain.

Comparisons [No compliance] policy struggles to modulate contact force and often performs an incomplete wipe or triggers a safety fault due to excessive contact force. [No force] policy also fails to grasp unseen narrower erasers. [Contact mic] policy often triggers a safety fault due to excessive contact force.

Task 2: Skewer Zucchini 🥒

Task The robot should skewer a zucchini on a stick; grasp the zucchini slice firmly, and push it onto a stick until punctured.

Comparisons [No compliance] performance similar to our method due to the compliance of the zucchini itself. [No force] policy fails to resist the reaction force between the stick and the zucchini. The zucchini often slips out of grasp.

Task 3: Lightbulb Insertion 💡

Task The robot should insert a lightbulb in a socket; grasp the bulb firmly, approach the socket, make gentle contact and rotate the bulb until the bayonet pin is aligned with the slit on the socket, insert while overcoming the reaction force of the spring-loaded electrode, and rotate to light up the bulb.

Comparisons [No compliance] policy may fail to maintain contact of the bulb with the socket during haptic search. This can cause the bayonet pin to rotate past the slit. [No force] policy displays lower accuracy in motion, resulting in misalignment between the bulb and the socket. [Contact mic] policy failed due to excessive rotation of the bulb during haptic search or slippage of the bulb during insertion.

BibTeX

@misc{choi2026inthewildcompliantmanipulationumift, title={In-the-Wild Compliant Manipulation with UMI-FT}, author={Hojung Choi and Yifan Hou and Chuer Pan and Seongheon Hong and Austin Patel and Xiaomeng Xu and Mark R. Cutkosky and Shuran Song}, year={2026}, eprint={2601.09988}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2601.09988}, }

In-the-Wild Compliant Manipulation with UMI-FT

UMI-FT allows multi-modal imitation learning using a modified adaptive compliance policy.

In-the-wild multimodal data enables generalization to unseen scenes and clutter.

Abstract

Video

UMI-FT System Design

Evaluations

(1) Whiteboard Wiping 🧽

Task 2: Skewer Zucchini 🥒

Task 2-2: Skewer Zucchini In-the-Wild 🥒

Task 3: Lightbulb Insertion 💡

BibTeX