paperarXivTrust 82 · PrimaryPublished yesterdayLive · 14m ago

Self-Gating Attention for Efficient Time Series Forecasting

Transformer architectures have shown strong potential in time series forecasting, where multi-head self-attention is widely used to capture temporal dependencies across historical timestamps. However, standard self-attention has quadratic time and memory complexity with respect to the look-back length. This cost may limit its use in resource-constrained or high-throughput forecasting systems, where fast and memory-efficient inference is important. Through qualitative and quantitative analyses, we observe that self-attention maps in time series forecasting often contain redundant patterns acros

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorDezheng Wang →
Self-Gating Attention for Efficient Time Series Forecasting
Linked via arxiv authorTong Chen →
Self-Gating Attention for Efficient Time Series Forecasting
Linked via arxiv authorWei Yuan →
Self-Gating Attention for Efficient Time Series Forecasting
Linked via arxiv authorCongyan Chen →
Self-Gating Attention for Efficient Time Series Forecasting
Linked via arxiv authorShihua Li →
Self-Gating Attention for Efficient Time Series Forecasting
Linked via arxiv authorHongzhi Yin →
Self-Gating Attention for Efficient Time Series Forecasting

Implements

repoamazon-science/chronos-forecasting repoNixtla/statsforecast

authored (incoming)

personDezheng Wang personTong Chen personWei Yuan personCongyan Chen personShihua Li personHongzhi Yin

Related across the graph

personDezheng Wang repoamazon-science/chronos-forecasting repoNixtla/statsforecast personWei Yuan personShihua Li personTong Chen personHongzhi Yin personCongyan Chen

Topics

cs.LG