Read original ↗
paperarXivTrust 82 · PrimaryPublished yesterdayLive · 19h ago

The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection

Vision-Language-Action (VLA) models have shown remarkable promise in generalized robotic manipulation. However, their spatial generalization remains fragile. We argue that simply increasing the number of viewpoints is insufficient. Models often fall into the trap of Shortcut Learning, latching onto spurious correlations (e.g., fixed relative poses between objects or between the camera and robot base) rather than learning true spatial relationships. In this work, we propose a data-centric solution to enhance VLA spatial generalization. We utilize a dual-arm setup where one arm performs manipula

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

  • Linked via arxiv authorJincheng Tang

    The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection

  • Linked via arxiv authorYilong Zhu

    The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection

  • Linked via arxiv authorZhengyuan Xie

    The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection

  • Linked via arxiv authorJiang-Jiang Liu

    The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection

  • Linked via arxiv authorJiaxing Zhang

    The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection

Implements

authored (incoming)

Implements (incoming)

Related across the graph

Topics