paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago
DPPE: Rethinking Camera-Based Positional Encoding for Scaling Multi-View Transformers
The remarkable scalability of Transformers has expanded their application to 3D computer vision, where camera-aware positional encoding is crucial for providing spatial cues in multi-view geometry. Recent advancements have established the practice of using camera parameters -- such as extrinsics or projection matrices -- as relative positional encoding into the query, key, and value vectors of the attention mechanism. However, when scaling up the training recipe of novel view synthesis (NVS) models with the camera-based positional encoding, we observe a significant issue: model performance sta
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
