paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

Beyond 2D Matching: A Unified Single-Stage Framework for Geometry-Aware Cross-View Object Geo-Localization

Cross-view object geo-localization (CVOGL) aims to locate a target object from a query view (e.g., ground or drone) within a geo-tagged reference image (e.g., satellite). Existing approaches heavily rely on 2D appearance matching and are constrained by limited datasets lacking geometric metadata, diverse prompts, and standard field-of-view imagery. To address these intertwined challenges, we first introduce \dataset, a large-scale, high-fidelity building dataset comprising over 220,000 ground-satellite and drone-satellite pairs. It provides multi-modal prompts (points, boxes, masks) and camera

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsShowcase: geolocating a dashcam video without GPS, only from the footage [P]newsEmbed the world: Multimodal AI for searchable aerial imagery at scale

Related across the graph

newsShowcase: geolocating a dashcam video without GPS, only from the footage [P]newsEmbed the world: Multimodal AI for searchable aerial imagery at scale

Topics

cs.AI