The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection
Vision-Language-Action (VLA) models have shown remarkable promise in generalized robotic manipulation. However, their spatial generalization remains fragile. We argue that simply increasing the number of viewpoints is insufficient. Models often fall into the trap of Shortcut Learning, latching onto spurious correlations (e.g., fixed relative poses between objects or between the camera and robot base) rather than learning true spatial relationships. In this work, we propose a data-centric solution to enhance VLA spatial generalization. We utilize a dual-arm setup where one arm performs manipula
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Why these links exist
- Linked via arxiv authorJincheng Tang →
The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection
- Linked via arxiv authorYilong Zhu →
The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection
- Linked via arxiv authorZhengyuan Xie →
The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection
- Linked via arxiv authorJiang-Jiang Liu →
The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection
- Linked via arxiv authorJiaxing Zhang →
The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection
