paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago

TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning

Agentic reinforcement learning requires assigning credit to environment-facing actions such as searches, clicks, edits, navigation commands, and object interactions. Standard GRPO uses the final verifier outcome as a uniform advantage over all action tokens. This outcome signal is useful but structurally incomplete: it punishes useful exploration in failed rollouts and reinforces redundant or regressive actions in successful rollouts. We propose TRIAGE, a role-typed credit assignment framework that adds a semantic role axis to outcome credit. A structured judge classifies each segment as decis

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsProduction-grade AI agents for financial compliance: Lessons from Stripe

Has model

modelAgentCore-8B

Implements

repoagent-tools

Related across the graph

modelAgentCore-8B newsProduction-grade AI agents for financial compliance: Lessons from Stripe repoagent-tools

Topics

cs.AI