paperarXivTrust 82 · PrimaryPublished yesterdayLive · 19h ago

When Token Compression Breaks: Structural Pruning vs. Token Reduction for Robust ViT Segmentation under High Compression

Vision Transformers (ViTs) are strong backbones for semantic segmentation, but their computational cost limits deployment. Recent token compression methods for efficient transformer-based segmentation reduce this cost by decreasing the number of tokens. However, existing evaluations primarily focus on low-to-moderate compression, leaving their behavior under aggressive compression and corrupted inputs unclear. Meanwhile, structural pruning provides an orthogonal route to efficiency by removing redundant components in the ViT architecture, but is rarely compared to token compression under a uni

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorTien-Phat Nguyen →
When Token Compression Breaks: Structural Pruning vs. Token Reduction for Robust ViT Segmentation under High Compression
Linked via arxiv authorNgai-Man Cheung →
When Token Compression Breaks: Structural Pruning vs. Token Reduction for Robust ViT Segmentation under High Compression

Covers

newsVideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization - Apple Machine Learning Research

authored (incoming)

personTien-Phat Nguyen personNgai-Man Cheung

Related across the graph

personNgai-Man Cheung personTien-Phat Nguyen newsVideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization - Apple Machine Learning Research

Topics

cs.CV