Read original ↗
paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago

TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning

Agentic reinforcement learning requires assigning credit to environment-facing actions such as searches, clicks, edits, navigation commands, and object interactions. Standard GRPO uses the final verifier outcome as a uniform advantage over all action tokens. This outcome signal is useful but structurally incomplete: it punishes useful exploration in failed rollouts and reinforces redundant or regressive actions in successful rollouts. We propose TRIAGE, a role-typed credit assignment framework that adds a semantic role axis to outcome credit. A structured judge classifies each segment as decis

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

Has model

Implements

Related across the graph

Topics