HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety
We present HaloGuard 1.0, an open-weights implementation of the constitutional-classifier paradigm for input safety. It achieves state-of-the-art performance on English and multilingual prompt-safety benchmarks at roughly one-tenth the model size of current leading open guard models. The safety constitution is the organising structure of the corpus: a natural-language constitution of 46 policies and 2,940 subcategories drives synthetic data generation, with exhaustive one-to-one paired counterfactuals that hold topic and vocabulary fixed while flipping intent, a two-tier harmless design that s
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Why these links exist
- Linked via arxiv authorNavaneeth Sangameswaran →
HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety
- Linked via arxiv authorPreetham S →
HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety
- Linked via arxiv authorAshmiya Lenin →
HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety
