Read original ↗
paperarXivTrust 82 · PrimaryPublished yesterdayLive · 19h ago

HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety

We present HaloGuard 1.0, an open-weights implementation of the constitutional-classifier paradigm for input safety. It achieves state-of-the-art performance on English and multilingual prompt-safety benchmarks at roughly one-tenth the model size of current leading open guard models. The safety constitution is the organising structure of the corpus: a natural-language constitution of 46 policies and 2,940 subcategories drives synthetic data generation, with exhaustive one-to-one paired counterfactuals that hold topic and vocabulary fixed while flipping intent, a two-tier harmless design that s

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

  • Linked via arxiv authorNavaneeth Sangameswaran

    HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety

  • Linked via arxiv authorPreetham S

    HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety

  • Linked via arxiv authorAshmiya Lenin

    HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety

Covers

Implements

Related to

authored (incoming)

Implements (incoming)

Related across the graph

Topics