paper · arXiv

Joint Learning of Experiential Rules and Policies for Large Language Model Agents

For LLM agents in multi-step interactive environments, a key challenge is to make effective use of accumulated interaction experience. Existing work has typically separated two uses of such experience: keeping it outside the model as natural-language rules for later prompting, or using trajectories and feedback to update the model parameters. The former is easy to interpret but can fall out of sync with the evolving policy; the latter improves the policy more broadly but provides only limited correction for local mistakes in sparse-reward settings. We present Joint Learning of Experiential Rul

Want the primary source?View original →

glossary_termRLHF

modelAgentCore-8B

repoagent-tools

newsRL without TD learning

newsRL without TD learning glossary_termRLHF modelAgentCore-8B repoagent-tools

cs.AI