Topic cluster · 4 items
vision
model
VioletVision-3B
An open vision-language model for captioning and VQA.
modelDiffuse-XL
A text-to-image diffusion model with photographic fidelity.
paperVision-language pretraining at scale
Joint training recipes that align images and text in one embedding space.
repovlm-starter
A starter kit for training vision-language models.