paper · arXiv

Automating Potential-based Reward Shaping with Vision Language Model Guidance

Sparse rewards are inherently challenging for reinforcement learning agents as they lack intermediate feedback to guide exploration and to correctly attribute the sparse success rewards to relevant parts of the trajectory. Naive reward shaping can induce reward hacking, yielding policies that exploit auxiliary signals instead of solving the intended task. Potential-based reward shaping (PBRS) guarantees preservation of the optimal policy set, but requires the definition of a heuristic potential function over the state space. In this work, we introduce the VLM-guided PBRS framework VLM-PBRS tha

Want the primary source?View original →

repovlm-starter

newsGradient-based Planning for World Models at Longer Horizons

glossary_termRLHF

newsGradient-based Planning for World Models at Longer Horizons glossary_termRLHF repovlm-starter

cs.AI