Angestrom
Search
Papers
Models
Live AI
Intelligence
Search
⌕
Go
⌘K
More
▾
Enterprise
Pricing
Sign in
≡
Home
/
Papers
/
Vision-language pretraining at scale
paper · arXiv
Vision-language pretraining at scale
Joint training recipes that align images and text in one embedding space.
✦
Explain this simply
Want the primary source?
View original →
Has model (incoming)
model
VioletVision-3B
Implements (incoming)
repo
vlm-starter
⌥ PATH
·
M
VioletVision-3B
→
R
vlm-starter
→
P
Vision-language pretraining at scale
⧉
↗ share
Related across the graph
model
VioletVision-3B
repo
vlm-starter
Topics
vision
multimodal
✦