I benchmarked PrismML's 1-bit Bonsai-8B against IBM's Granite on CPU tool calling. The 1-bit model won, but only with grammar-constrained decoding
Everyone keeps asking if the 1-bit models are actually usable for agents, so I ran the numbers myself. Couldn't find a single independent tool-calling eval of Bonsai-8B anywhere. Not on the BFCL leaderboard, nothing on BenchLM. So as far as I can tell this is the first one. Setup
Why it matters
This story from Reddit r/LocalLLaMA is relevant to the Open Source branch of the AI ecosystem and may affect models, products, or research direction.
Technical breakdown
Everyone keeps asking if the 1-bit models are actually usable for agents, so I ran the numbers myself. Couldn't find a single independent tool-calling eval of Bonsai-8B anywhere. Not on the BFCL leaderboard, nothing on BenchLM. So as far as I can tell this is the first one. Setup: 30 deterministic tool-call cases (single, parallel, sequential, abstention, format), temp 0, mainline llama.cpp on CPU
Business impact
Watch for product launches, funding moves, or policy shifts tied to this headline.
