Read original ↗
newsReddit r/LocalLLaMATrust 58 · CommunityPublished 2d agoLive · 2d ago

Open benchmark: how well can multimodal LLMs read a calendar week-view from a screenshot? Humans ~99%, Q4 local models.....

Some backstory I've been working on my local agent (openclaw), and I wanted to give it the skill to reconstruct calendar entries from a photo of the screen. I couldn't get at the calendar through an API (long story), so a photo was the only low-friction way to export the data. What should have been an easy "skill building exercise" endet as a frustrating problem hunt. My agent went wrong more often than I expected: times