Read original ↗
paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

REAR: Test-time Preference Realignment through Reward Decomposition

Aligning large language models (LLMs) with diverse user preferences is a critical yet challenging task. While post-training methods can adapt models to specific needs, they often require costly data curation and additional training. Test-time scaling (TTS) presents an efficient, training-free alternative, but its application has been largely limited to verifiable domains like mathematics and coding, where response correctness is easily judged. To extend TTS to preference alignment, we introduce a novel framework that models the task as a realignment problem, since the base model often fails to

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Implements

Related across the graph

Topics