person profile

Greg Durrett

Greg Durrett — researcher or builder tracked in the Angestrom contributor network.

3Connections

1Papers

0Models

0Repos

0News

Papers · 1

Visually Grounded Self-Reflection for Vision-Language Models via Reinforcement Learning

Large vision-language models can reason over multimodal inputs by generating textual chains of thought (CoT). A key capability exhibited in CoT reasoning is self-reflection: revisiting earlier decisions and correcting previous errors. However, existing LVLMs often fail to properly attend to visual inputs during reflection, limiting their ability to translate feedback into grounded corrections, especially for out-of-distribution images. To address this issue, we propose a novel reinforcement learning training framework VRRL, with two components explicitly designed to elicit visually grounded se