Read original ↗
paperarXivTrust 82 · PrimaryPublished 5d agoLive · 3d ago

W4A4 Quantization for Inference on Wan2.2-I2V-A14B

We summarize our submission to Sub-Challenge 1: W4A4 Quantization for Inference (HiF4 / MXFP4) of the ICME 2026 Low-Bit-width Large-Model Quantization Challenge. The sub-challenge targets 4-bit weight and 4-bit activation inference on Wan-AI/Wan2.2-I2V-A14B under HiF4 or MXFP4 numerical formats. We adapt two complementary ideas from LLM quantization, MixQ-style mixed precision for sparse activation outliers and SmoothQuant-style per-channel smoothing, together with block-wise HiF4 packing for Wan2.2 feed-forward linear layers. Calibration on representative OpenS2V-5M batches identifies heavy-t

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Implements

Related across the graph

Topics