Read original ↗
paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

Latent Actions from Factorized Transition Effects under Agent Ambiguity

Latent Action Models (LAMs) learn action-like proxies from observation transitions. However, in multi-object or distractor-rich scenes, these visual effects mix agent motion with distractors, camera dynamics, and background changes, making the underlying action source ambiguous without supervision. Structuring this mixture as reusable transition effects provides an intermediate representation from which action-like latents can be more robustly formed. We introduce Observed Transition Factorization (OTF), which decomposes each transition into a sparse set of observed transition primitives. Usin

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Topics