Read original ↗
paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

VLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed Scenes

Perception-based humanoid loco-manipulation requires connecting egocentric observations and task instructions to whole-body motion. Learning this mapping requires synchronized egocentric images, language commands, and robot-compatible kinematic trajectories, yet no existing data source provides this complete tuple at scale. We address this bottleneck by generating vision-language-kinematics (VLK) supervision synthetically in reconstructed scenes. Our pipeline leverages 3D Gaussian Splatting to reconstruct metric-scale indoor environments, synthesizes navigation and object-interaction trajector

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

Implements (incoming)

Related across the graph

Topics