Read original ↗
paperarXivTrust 82 · PrimaryPublished yesterdayLive · 5h ago

Object-centric LeJEPA

Image encoders trained with LeJEPA can deliver strong features for downstream tasks, but, like other image-level self-supervised methods, typically require large training datasets. Aligning representations at the level of objects rather than whole scenes promises greater data efficiency, but doing this in a completely self-supervised way, effectively jointly partitioning a scene and representing its objects, is unstable: the two are locked in a cyclic dependency, partitioning requires meaningful representations, while meaningful representations require consistent partitioning. We sidestep this

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Implements

authored (incoming)

Implements (incoming)

Related across the graph

Topics