Read original ↗
paperarXivTrust 82 · PrimaryPublished yesterdayLive · 19h ago

Generalization in offline RL: The structure is more important than the amount of pessimism

While pessimism counteracts overestimation bias in offline reinforcement learning (RL), being overly conservative has been associated with hindering certain forms of generalization. However, in this paper we demonstrate that being overly pessimistic does not inherently prevent optimal generalization in contextual MDPs (CMDPs). Instead, we argue successful generalization depends not on the amount of pessimism, but whether the pessimistic structure respects the underlying symmetries of the optimal solution. We prove that a mildly pessimistic, non-symmetric value function can generalize worse tha

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

  • Linked via arxiv authorMax Weltevrede

    Generalization in offline RL: The structure is more important than the amount of pessimism

  • Linked via arxiv authorMatthijs T. J. Spaan

    Generalization in offline RL: The structure is more important than the amount of pessimism

  • Linked via arxiv authorWendelin Böhmer

    Generalization in offline RL: The structure is more important than the amount of pessimism

Implements

Related to

Covers

authored (incoming)

Related across the graph

Topics