Read original ↗
paperarXivTrust 82 · PrimaryPublished 2d agoLive · 21h ago

CausalMix: Data Mixture as Causal Inference for Language Model Training

In Large Language Model (LLM) training, data mixing plays a pivotal role in determining model performance. Recent methods optimize mixture weights via proxy models, but they rely on the assumption of static data distributions. As a result, when the underlying data pool shifts, these methods require costly retraining from scratch. This limitation restricts their ability to scale seamlessly from small settings to larger data pools and model sizes. In this paper, we propose CausalMix to address this limitation by casting data mixture optimization as a causal inference problem. We formulate the st

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

  • Linked via arxiv authorZinan Tang

    CausalMix: Data Mixture as Causal Inference for Language Model Training

  • Linked via arxiv authorYukun Zhang

    CausalMix: Data Mixture as Causal Inference for Language Model Training

  • Linked via arxiv authorShaomian Zheng

    CausalMix: Data Mixture as Causal Inference for Language Model Training

  • Linked via arxiv authorZhuoshi Pan

    CausalMix: Data Mixture as Causal Inference for Language Model Training

  • Linked via arxiv authorQizhi Pei

    CausalMix: Data Mixture as Causal Inference for Language Model Training

  • Linked via arxiv authorDingnan Jin

    CausalMix: Data Mixture as Causal Inference for Language Model Training

  • Linked via arxiv authorJun Zhou

    CausalMix: Data Mixture as Causal Inference for Language Model Training

  • Linked via arxiv authorYujun Wang

    CausalMix: Data Mixture as Causal Inference for Language Model Training

  • Linked via arxiv authorBiqing Huang

    CausalMix: Data Mixture as Causal Inference for Language Model Training

authored (incoming)

Covers (incoming)

Related across the graph

Topics