Topic cluster · 4 items

vision

model

VioletVision-3B

An open vision-language model for captioning and VQA.

model

Diffuse-XL

A text-to-image diffusion model with photographic fidelity.

paper

Vision-language pretraining at scale

Joint training recipes that align images and text in one embedding space.

repo

vlm-starter

A starter kit for training vision-language models.

Related topics