paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago
REAR: Test-time Preference Realignment through Reward Decomposition
Aligning large language models (LLMs) with diverse user preferences is a critical yet challenging task. While post-training methods can adapt models to specific needs, they often require costly data curation and additional training. Test-time scaling (TTS) presents an efficient, training-free alternative, but its application has been largely limited to verifiable domains like mathematics and coding, where response correctness is easily judged. To extend TTS to preference alignment, we introduce a novel framework that models the task as a realignment problem, since the base model often fails to
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
