Open-H-Embodiment releases 778 hours of surgical robotics data and two open-source VLA models across 35 institutions.
A 35-organization consortium led by Johns Hopkins, TU Munich, and NVIDIA released Open-H-Embodiment: 778 hours of CC-BY-4.0 healthcare robotics training data covering surgical robotics, ultrasound, and colonoscopy. Alongside the dataset, they released GR00T-H, the first Vision-Language-Action policy model for surgical robotics tasks, trained on ~600 hours of this data using NVIDIA's Isaac GR00T N series and Cosmos Reason 2 2B as its VLM backbone. A second model targets ultrasound autonomy. Training required approximately 10,000 GPU-hours, and all assets are permissively open-source.
This is the first large-scale, permissively licensed dataset with synchronized vision-force-kinematics data for surgical robotics — the kind of multimodal embodied data that simply didn't exist publicly before. GR00T-H introduces four concrete architectural choices (State Dropout, metadata injection, unified 44-dimensional action space) that solve real imitation learning problems in cable-driven surgical hardware. Developers working on robotic manipulation, sim-to-real transfer, or VLA fine-tuning now have a credible open foundation to build on rather than starting from scratch.
Clone the Open-H-Embodiment GitHub repo this week and run inference with GR00T-H on the provided surgical task demos — benchmark its action prediction latency and accuracy against your current policy baseline to determine if it's worth fine-tuning on your specific robotic platform.
Go to the Open-H-Embodiment GitHub repo, pull the GR00T-H model card, and paste this into Claude.ai or ChatGPT: 'Given this VLA architecture using State Dropout at 100% and a 44-dimensional action space, what are the top 3 failure modes I should test for when deploying on a cable-driven surgical robot?' You'll get a concrete test plan in under 2 minutes.
Tags
Signals by role
Also today
Tools mentioned