paperarXivTrust 82 · PrimaryPublished yesterdayLive · 19h ago

HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety

We present HaloGuard 1.0, an open-weights implementation of the constitutional-classifier paradigm for input safety. It achieves state-of-the-art performance on English and multilingual prompt-safety benchmarks at roughly one-tenth the model size of current leading open guard models. The safety constitution is the organising structure of the corpus: a natural-language constitution of 46 policies and 2,940 subcategories drives synthetic data generation, with exhaustive one-to-one paired counterfactuals that hold topic and vocabulary fixed while flipping intent, a two-tier harmless design that s

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorNavaneeth Sangameswaran →
HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety
Linked via arxiv authorPreetham S →
HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety
Linked via arxiv authorAshmiya Lenin →
HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety