Alignment Is All You Need For X-to-4D Generation
Generative diffusion models excel at synthesizing high-quality images, videos, and 3D content under multimodal control. However, arbitrary user-defined modality-to-4D (X-to-4D) generation remains challenging due to the high cost of constructing diverse datasets and the limited scalability of existing methods. This paper presents Align4D, a flexible framework that translates any-modal input into coherent video-3D pairs, using video to guide 4D motion and 3D data to shape 4D geometry. Align4D introduces three key techniques: (1) Object Distance Alignment, which searches Video-Aligned and Multivi
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Why these links exist
- Linked via arxiv authorQiaowei Miao →
Alignment Is All You Need For X-to-4D Generation
- Linked via arxiv authorKehan Li →
Alignment Is All You Need For X-to-4D Generation
- Linked via arxiv authorYawei Luo →
Alignment Is All You Need For X-to-4D Generation
- Linked via arxiv authorYi Yang →
Alignment Is All You Need For X-to-4D Generation
