The research frontier

Papers

Every paper, read once and connected for good — summarized, then linked to the models it became, the code that runs it, and the ideas it builds on.

Featuredcs.CLyesterday

DanceOPD: On-Policy Generative Field Distillation

Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For instance, editing tends to degrade T2I performance, while global and local editing interfere with each other. Consequently, effectively composing these capabilities has become a central challenge for image generation model training. To tackle this, we introduce DanceOPD, an on-policy generative field distillation framework for flow-matching models that routes each samp

Read paper →

Ask, Solve, Generate: Self-Evolving Unified Multimodal Understanding and Generation via Self-Consistency Rewards

Most unified large multimodal models (LMMs) that support both visual understanding and image generation still rely on curated post-training supervision, such as human annotations, preference labels, or external reward models. We ask whether a unified LMM can improve both abilities autonomously using only unlabeled images. We propose a self-evolving training framework with three internal roles: a Proposer that generates visual questions, a Solver that answers and evaluates them, and a Generator that synthesizes images. Training uses only self-derived consistency signals, without human annotatio

DanceOPD: On-Policy Generative Field Distillation

Ask, Solve, Generate: Self-Evolving Unified Multimodal Understanding and Generation via Self-Consistency Rewards

World Action Models Enable Continual Imitation Learning with Recurrent Generative Replays

Paying More Attention to Visual Tokens in Self-Evolving Large Multimodal Models

DnA: Denoising Attention for Visual Tasks

Don't Settle at the Mode! Mitigating Diversity Collapse in Pretrained Flow Models via Feature Self-Guidance

Reinforcement Learning without Ground-Truth Solutions can Improve LLMs

PhysiFormer: Learning to Simulate Mechanics in World Space

Autoregressive Boltzmann Generators

When are likely answers right? On Sequence Probability and Correctness in LLMs

Error-Conditioned Neural Solvers

Mapping Political-Elite Networks in Europe with a Multilingual Joint Entity-Relation Extraction Pipeline

RayPE: Ray-Space Positional Encoding for 3D-Aware Video Generation

Understanding Domain-Aware Distribution Alignment in Budgeted Entity Matching

SAM2Matting: Generalized Image and Video Matting

Language-Based Digital Twins for Elderly Cognitive Assistance

RoPEMover: Depth-Aware Object Relocation via Positional Embeddings

Empowering GUI Agents via Autonomous Experience Exploration and Hindsight Experience Utilization for Task Planning

Hallucination in World Models is Predictable and Preventable

Not All Actions Are Equal: Rethinking Conditioning for Dexterous World Model

Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders

OctoSense: Self-Supervised Learning for Multimodal Robot Perception

LLM-Based Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank

Blackwell Approachability and Gradient Equilibrium are Equivalent

Beyond Surface Forms: A Comprehensive, Mechanism-Oriented Taxonomy of Indirect Linguistic Encoding for LLM-Based Coded Language Detection

ViQ: Text-Aligned Visual Quantized Representations at Any Resolution

See & Sniff: Learning Visuo-Olfactory Representations

Multilingual Reasoning Cascades Need More Context

Sculpting NeRF Geometry: Human-Preference Fine-Tuning of a 3D-Aware Face GAN

A Multi-Fidelity Convolutional Autoencoder-Transfer Learning Framework for Guided-Wave-Based Damage Diagnosis Using Large Simulated and Limited Experimental Datasets

AI Healthcare Chatbots as Information Infrastructure: A Large-Scale Study of User-Reported Breakdowns

Fast algorithms for learning a Gaussian under halfspace truncation with optimal sample complexity

Generative Models on Analog Hardware with Dynamics

Designing Reward Signals for Portable Query Generation: A Case Study in Industrial Semantic Job Search

When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models

Prompt Injection in Automated Résumé Screening with Large Language Models: Single and Multi-Injection Settings

Simulation-based inference for rapid Bayesian parameter estimation in epidemiological models: a comparison with MCMC

Recovering Governing Equations from Solution Data: Identifiability Bounds for Linear and Nonlinear ODEs

How Good Can Linear Models Be for Time-Series Forecasting?

Exact and Deterministic Patch Descriptor Retrieval via Hierarchical Normalization