paperarXivTrust 82 · PrimaryPublished yesterdayLive · 19h ago

Generalization in offline RL: The structure is more important than the amount of pessimism

While pessimism counteracts overestimation bias in offline reinforcement learning (RL), being overly conservative has been associated with hindering certain forms of generalization. However, in this paper we demonstrate that being overly pessimistic does not inherently prevent optimal generalization in contextual MDPs (CMDPs). Instead, we argue successful generalization depends not on the amount of pessimism, but whether the pessimistic structure respects the underlying symmetries of the optimal solution. We prove that a mildly pessimistic, non-symmetric value function can generalize worse tha

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorMax Weltevrede →
Generalization in offline RL: The structure is more important than the amount of pessimism
Linked via arxiv authorMatthijs T. J. Spaan →
Generalization in offline RL: The structure is more important than the amount of pessimism
Linked via arxiv authorWendelin Böhmer →
Generalization in offline RL: The structure is more important than the amount of pessimism

Implements

reporllm-org/rllm

Related to

glossary_termRLHF

Covers

newsRL without TD learning

authored (incoming)

personMax Weltevrede personMatthijs T. J. Spaan personWendelin Böhmer

Related across the graph

newsRL without TD learning personMax Weltevrede glossary_termRLHF reporllm-org/rllm personWendelin Böhmer personMatthijs T. J. Spaan

Topics

cs.AI