Self-Gating Attention for Efficient Time Series Forecasting
Transformer architectures have shown strong potential in time series forecasting, where multi-head self-attention is widely used to capture temporal dependencies across historical timestamps. However, standard self-attention has quadratic time and memory complexity with respect to the look-back length. This cost may limit its use in resource-constrained or high-throughput forecasting systems, where fast and memory-efficient inference is important. Through qualitative and quantitative analyses, we observe that self-attention maps in time series forecasting often contain redundant patterns acros
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Why these links exist
- Linked via arxiv authorDezheng Wang →
Self-Gating Attention for Efficient Time Series Forecasting
- Linked via arxiv authorTong Chen →
Self-Gating Attention for Efficient Time Series Forecasting
- Linked via arxiv authorWei Yuan →
Self-Gating Attention for Efficient Time Series Forecasting
- Linked via arxiv authorCongyan Chen →
Self-Gating Attention for Efficient Time Series Forecasting
- Linked via arxiv authorShihua Li →
Self-Gating Attention for Efficient Time Series Forecasting
- Linked via arxiv authorHongzhi Yin →
Self-Gating Attention for Efficient Time Series Forecasting
