Read original ↗
newsHacker NewsTrust 72 · CommunityPublished yesterdayLive · yesterday

Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers

110points84comments

Covers

Related across the graph