CausalMix: Data Mixture as Causal Inference for Language Model Training
In Large Language Model (LLM) training, data mixing plays a pivotal role in determining model performance. Recent methods optimize mixture weights via proxy models, but they rely on the assumption of static data distributions. As a result, when the underlying data pool shifts, these methods require costly retraining from scratch. This limitation restricts their ability to scale seamlessly from small settings to larger data pools and model sizes. In this paper, we propose CausalMix to address this limitation by casting data mixture optimization as a causal inference problem. We formulate the st
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Why these links exist
- Linked via arxiv authorZinan Tang →
CausalMix: Data Mixture as Causal Inference for Language Model Training
- Linked via arxiv authorYukun Zhang →
CausalMix: Data Mixture as Causal Inference for Language Model Training
- Linked via arxiv authorShaomian Zheng →
CausalMix: Data Mixture as Causal Inference for Language Model Training
- Linked via arxiv authorZhuoshi Pan →
CausalMix: Data Mixture as Causal Inference for Language Model Training
- Linked via arxiv authorQizhi Pei →
CausalMix: Data Mixture as Causal Inference for Language Model Training
- Linked via arxiv authorDingnan Jin →
CausalMix: Data Mixture as Causal Inference for Language Model Training
- Linked via arxiv authorJun Zhou →
CausalMix: Data Mixture as Causal Inference for Language Model Training
- Linked via arxiv authorYujun Wang →
CausalMix: Data Mixture as Causal Inference for Language Model Training
- Linked via arxiv authorBiqing Huang →
CausalMix: Data Mixture as Causal Inference for Language Model Training
