paperarXivTrust 82 · PrimaryPublished 2d agoLive · 21h ago

GeoSearcher: Anchor-Guided Progressive Reasoning for Remote Sensing Visual Grounding with Process Supervision

Recent multimodal large language models (MLLMs) have shown strong cross-modal understanding and coordinate generation abilities in visual grounding. However, transferring these abilities to remote sensing visual grounding (RSVG) remains challenging. High-resolution remote sensing images usually cover large-scale scenes, where targets are often extremely small and surrounded by numerous visually similar distractors. Meanwhile, queries often contain multiple clues, such as reference objects, spatial relations, and target attributes. Existing MLLM-based methods usually formulate RSVG as one-step

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorDianyu Wang →
GeoSearcher: Anchor-Guided Progressive Reasoning for Remote Sensing Visual Grounding with Process Supervision
Linked via arxiv authorYidan Zhang →
GeoSearcher: Anchor-Guided Progressive Reasoning for Remote Sensing Visual Grounding with Process Supervision
Linked via arxiv authorPeirong Zhang →
GeoSearcher: Anchor-Guided Progressive Reasoning for Remote Sensing Visual Grounding with Process Supervision
Linked via arxiv authorXuyang Li →
GeoSearcher: Anchor-Guided Progressive Reasoning for Remote Sensing Visual Grounding with Process Supervision
Linked via arxiv authorXiaoxuan Liu →
GeoSearcher: Anchor-Guided Progressive Reasoning for Remote Sensing Visual Grounding with Process Supervision
Linked via arxiv authorLei Wang →
GeoSearcher: Anchor-Guided Progressive Reasoning for Remote Sensing Visual Grounding with Process Supervision

Covers

newsEmbed the world: Multimodal AI for searchable aerial imagery at scale

authored (incoming)

personDianyu Wang personYidan Zhang personPeirong Zhang personXuyang Li personXiaoxuan Liu personLei Wang

Related across the graph

personPeirong Zhang personXiaoxuan Liu personYidan Zhang newsEmbed the world: Multimodal AI for searchable aerial imagery at scale personXuyang Li personLei Wang personDianyu Wang

Topics

cs.CV