paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago
VLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed Scenes
Perception-based humanoid loco-manipulation requires connecting egocentric observations and task instructions to whole-body motion. Learning this mapping requires synchronized egocentric images, language commands, and robot-compatible kinematic trajectories, yet no existing data source provides this complete tuple at scale. We address this bottleneck by generating vision-language-kinematics (VLK) supervision synthetically in reconstructed scenes. Our pipeline leverages 3D Gaussian Splatting to reconstruct metric-scale indoor environments, synthesizes navigation and object-interaction trajector
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
