Read original ↗
newsReddit r/LocalLLaMATrust 58 · CommunityPublished yesterdayLive · yesterday

I benchmarked PrismML's 1-bit Bonsai-8B against IBM's Granite on CPU tool calling. The 1-bit model won, but only with grammar-constrained decoding

Everyone keeps asking if the 1-bit models are actually usable for agents, so I ran the numbers myself. Couldn't find a single independent tool-calling eval of Bonsai-8B anywhere. Not on the BFCL leaderboard, nothing on BenchLM. So as far as I can tell this is the first one. Setup: 30 deterministic tool-call cases (single, parallel, sequential, abstention, format), temp 0, mainline llama.cpp on CPU. Each model runs twice: once raw, once with a GBNF grammar

Covers

Related across the graph