repoGitHubTrust 82 · PrimaryPublished yesterdayLive · 21h ago
kekzl/imp
From-scratch C++/CUDA inference engine for the NVIDIA RTX 5090 (sm_120a) — the best single-GPU backend for agentic AI: tool calling, long-context loops, reasoning and concurrent sub-agents on top of the fastest single-stream decode on the 5090 (beats llama.cpp, at-or-ahead of vLLM on NVFP4). 100% written by Claude Code.
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Covers
newsNVIDIA Vera CPU Opens the Way for Agentic Scientific AI at Los Alamos National LaboratorynewsNvidia’s AI Hardware Comes to Windows in RTX Spark PCsnewsClaude Meets Blackwell Ultra: Anthropic’s Models Now Run on NVIDIA GB300 in AzurenewsNVIDIA BioNeMo Agent Toolkit Brings Accelerated AI to Life Sciences Researchers in Claude SciencenewsBuild real agentic apps using CUGA: two dozen working examples on a lightweight harness
Related across the graph
newsBuild real agentic apps using CUGA: two dozen working examples on a lightweight harnessnewsClaude Meets Blackwell Ultra: Anthropic’s Models Now Run on NVIDIA GB300 in AzurenewsNVIDIA BioNeMo Agent Toolkit Brings Accelerated AI to Life Sciences Researchers in Claude SciencenewsNvidia’s AI Hardware Comes to Windows in RTX Spark PCsnewsNVIDIA Vera CPU Opens the Way for Agentic Scientific AI at Los Alamos National Laboratory
