news · BAIR (Berkeley)

RL without TD learning

<!-- twitter --> <p>In this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: <strong>divide and conquer</strong>. Unlike traditional methods, this algorithm is <em>not</em> based on temporal difference (TD) learning (which has <a href="https://seohong.me/blog/q-learning-is-not-yet-scalable/">scalability challenges</a>), and scales well to long-horizon tasks.</p> <p style="text-align: center;"> <img alt="" src="https://bair.berkeley.edu

Want the primary source?View original →