Read original ↗
paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

Learning from Reliable Latent Prompts for Visual Recognition with Missing Modalities

Large-scale multimodal models (LMMs) have achieved superior performance in visual recognition by synergizing information across diverse, massive-scale paired modalities. In real-world scenarios, however, missing-modality inputs are ubiquitous, causing models optimized for modality-complete data to exhibit precipitous performance degradation. Existing research has introduced prompt learning to mitigate this issue, typically by generating dynamic prompts from instance-level features, regardless of whether the input modalities are complete or partially absent. However, such input-conditioned stra

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Topics