Topic cluster · 1 items

scaling

paper

Scaling laws for mixture-of-experts models

How sparse expert routing changes the compute-optimal frontier for large models.

Related topics