paperarXivTrust 82 · PrimaryPublished 7d agoLive · 4d ago

AirGroundBench: Probing Spatial Intelligence in Multimodal Large Models under Heterogeneous Multi-View Embodied Collaboration

In recent years, multimodal large language models (MLLMs) have shown strong potential for embodied intelligence, yet their ability to maintain geometrically consistent spatial understanding across heterogeneous views remains under-evaluated. Existing benchmarks largely focus on single-agent, single-view perception, leaving a gap in the systematic assessment of collaborative air-ground settings, where multi-scale observations are complementary but introduce scale mismatch, asymmetric occlusion, and reference-frame inconsistencies. We present AirGroundBench, a diagnostic benchmark for evaluating

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsEmbed the world: Multimodal AI for searchable aerial imagery at scale

Covers (incoming)

newsAnyone looking into the new MARS2 Workshop/Competition @ ECCV 2026? I saw Tec-do posting it. [D]

Related across the graph

newsEmbed the world: Multimodal AI for searchable aerial imagery at scale newsAnyone looking into the new MARS2 Workshop/Competition @ ECCV 2026? I saw Tec-do posting it. [D]

Topics

cs.CV