news · BAIR (Berkeley)

RL without TD learning

In this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer. Unlike traditional methods, this algorithm is not based on temporal difference (TD) learning (which has <a href="https://seohong.me/blog/q-learning-is-not-yet-scalable/">scalability challenges</a>), and scales well to long-horizon tasks. <img alt="" src="https://bair.berkeley.edu

Want the primary source?View original →

paperReinforcement Learning without Ground-Truth Solutions can Improve LLMs paperJoint Learning of Experiential Rules and Policies for Large Language Model Agents paperLearning to Fold: prizewinning solution at LeHome Challenge 2026 (1st place online, 2nd offline)

paperLearning to Fold: prizewinning solution at LeHome Challenge 2026 (1st place online, 2nd offline)paperJoint Learning of Experiential Rules and Policies for Large Language Model Agents paperReinforcement Learning without Ground-Truth Solutions can Improve LLMs

Research BAIR (Berkeley)