Needle Lift
Behavior cloning can facilitate learning of dexterous manipulation skills, yet the complexity of surgical environments, the difficulty and expense of obtaining patient data, and robot calibration errors present unique challenges for surgical robot learning. We provide an enhanced surgical digital twin with photorealistic human anatomical organs, integrated into a comprehensive simulator designed to generate high-quality synthetic data for solving fundamental tasks in surgical autonomy. We present SuFIA-BC: exploring visual Behavior Cloning policies for Surgical First Interactive Autonomy Assistants. We investigate visual observation spaces including multi-view cameras and 3D visual representations extracted from a single endoscopic camera view. Through systematic evaluation, we find that the diverse set of photorealistic surgical tasks introduced in this work enable the nuanced evaluation of prospective behavior cloning models for the unique challenges posed by surgical environments. We observe that current state-of-the-art behavior cloning techniques struggle to solve the contact-rich and complex tasks evaluated in this work, regardless of their underlying perception or control architectures. These findings underscore the importance of tailoring perception pipelines and control architectures, as well as larger scale synthetic datasets tailored to the specific demands of surgical tasks.
This workflow illustrates the full pipeline for creating photorealistic anatomical models, from raw CT volume data to final OpenUSD in Nvidia Omniverse. The process includes organ segmentation, mesh conversion, mesh cleaning and refinement, photorealistic texturing, and culminating into a unified OpenUSD.
The photorealistic human organ models are available on GitHub.
3D Diffusion Policy rollout with point cloud derived from the primary task camera.
Tissue Retraction
Needle Lift
Needle Handover
Suture Pad
Block Transfer
Examples of ACT Multi-Camera models: trained on primary camera views (train) and evaluated under two perturbations: minor perturbations in camera positioning (view 1) and significant changes in viewpoint (view 2). It is important to note that in the multi-camera visual input, the wrist cameras maintain a consistent viewport throughout the evaluation.
(train)
(view 1)
(view 2)
Examples of ACT Multi-Camera models for needle instance generalization: We assess the effectiveness of the policies trained only on the primary suture needle (Needle N1) in lifting previously unseen suture needles with irregular shapes (Needles N2 - N5) at test time.
(Needle N1)
(Needle N2)
(Needle N3)
(Needle N4)
(Needle N5)
We would like to thank Miguel Guerrero, Vanni Brighella, and Ernesto Pacheco for their assistance in creating photorealistic human organ models.
For any questions, please feel free to contact Masoud Moghani and Animesh Garg.