paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago

ERA: Entropy-Guided Visual Token Pruning with Rectified Attention for Efficient MLLMs

Multimodal Large Language Models (MLLMs) incur prohibitive inference costs due to long visual token sequences. Training-free visual token reduction provides an efficient solution. However, existing methods distort attention distributions, giving rise to a phenomenon we term Attention Logit Collapse. To address this issue, we propose ERA, an Entropy-guided visual token pruning framework with Rectified Attention for efficient MLLMs. Specifically, ERA comprises three crucial components: Dual-view Entropy Pruning (DEP), Bias-aware Token Recycling (BTR), and Logit-preserving Attention Rectification

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsBreakthrough in long-context efficiency announced

Has model

modelVioletVision-3B

Related across the graph

modelVioletVision-3B newsBreakthrough in long-context efficiency announced

Topics

cs.CV