paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

VLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed Scenes

Perception-based humanoid loco-manipulation requires connecting egocentric observations and task instructions to whole-body motion. Learning this mapping requires synchronized egocentric images, language commands, and robot-compatible kinematic trajectories, yet no existing data source provides this complete tuple at scale. We address this bottleneck by generating vision-language-kinematics (VLK) supervision synthetically in reconstructed scenes. Our pipeline leverages 3D Gaussian Splatting to reconstruct metric-scale indoor environments, synthesizes navigation and object-interaction trajector

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsVideo Friday: Robotic Motion Discovery Reveals Unusual Behaviors newsInto the Omniverse: Three Workflows for Improving Vision AI Agent Accuracy With Synthetic Data and Fine-Tuning

Implements (incoming)

reporoboflow/supervision

Related across the graph

newsInto the Omniverse: Three Workflows for Improving Vision AI Agent Accuracy With Synthetic Data and Fine-Tuning reporoboflow/supervision newsVideo Friday: Robotic Motion Discovery Reveals Unusual Behaviors

Topics

cs.AI