paperarXivTrust 82 · PrimaryPublished 2d agoLive · 21h ago

CausalMix: Data Mixture as Causal Inference for Language Model Training

In Large Language Model (LLM) training, data mixing plays a pivotal role in determining model performance. Recent methods optimize mixture weights via proxy models, but they rely on the assumption of static data distributions. As a result, when the underlying data pool shifts, these methods require costly retraining from scratch. This limitation restricts their ability to scale seamlessly from small settings to larger data pools and model sizes. In this paper, we propose CausalMix to address this limitation by casting data mixture optimization as a causal inference problem. We formulate the st

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorZinan Tang →
CausalMix: Data Mixture as Causal Inference for Language Model Training
Linked via arxiv authorYukun Zhang →
CausalMix: Data Mixture as Causal Inference for Language Model Training
Linked via arxiv authorShaomian Zheng →
CausalMix: Data Mixture as Causal Inference for Language Model Training
Linked via arxiv authorZhuoshi Pan →
CausalMix: Data Mixture as Causal Inference for Language Model Training
Linked via arxiv authorQizhi Pei →
CausalMix: Data Mixture as Causal Inference for Language Model Training
Linked via arxiv authorDingnan Jin →
CausalMix: Data Mixture as Causal Inference for Language Model Training
Linked via arxiv authorJun Zhou →
CausalMix: Data Mixture as Causal Inference for Language Model Training
Linked via arxiv authorYujun Wang →
CausalMix: Data Mixture as Causal Inference for Language Model Training
Linked via arxiv authorBiqing Huang →
CausalMix: Data Mixture as Causal Inference for Language Model Training

authored (incoming)

personZinan Tang personYukun Zhang personShaomian Zheng personZhuoshi Pan personQizhi Pei personDingnan Jin personJun Zhou personYujun Wang personBiqing Huang

Covers (incoming)

newsLLM Data Mixture Breaks When Training Pools Shift: Causal Inference Offers Fix - Tech Times

Related across the graph

personBiqing Huang personQizhi Pei personJun Zhou personZinan Tang personYukun Zhang personShaomian Zheng personDingnan Jin newsLLM Data Mixture Breaks When Training Pools Shift: Causal Inference Offers Fix - Tech Times personZhuoshi Pan personYujun Wang

Topics

cs.CL