paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

Reweighting Framewise Attention in Video Transformers for Facial Expression Understanding

Understanding facial expressions in videos requires modeling subtle and localized facial dynamics under unconstrained conditions. Although recent Vision Transformer~(ViT)-based video models have shown strong performance through large-scale self-supervised pretraining, their attention mechanisms often emphasize dominant global motions and coarse temporal dynamics, limiting sensitivity to fine-grained facial variations. To address this limitation, we propose MiRA (Marginal-induced Attention Redistribution), a plug-in frame-marginal attention redistribution framework for ViT backbones that enhanc

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Has model

modelsentence-transformers/all-MiniLM-L6-v2 modelVioletVision-3B

Related across the graph

modelsentence-transformers/all-MiniLM-L6-v2 modelVioletVision-3B

Topics

cs.CV