repoGitHubTrust 82 · PrimaryPublished yesterdayLive · yesterday
jmaczan/tiny-vllm
Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Covers
newsOpenAI and Broadcom announce chip designed for LLM inference at scalenewsOpenAI and Broadcom unveil LLM-optimized inference chipnewsHardware startup unveils inference acceleratornewsA barebones CPU-only inference engine for Qwen 3, written from scratch in pure CnewsI mapped which local LLMs actually fit each RAM tier, 8 to 128GB (open dataset)
Related across the graph
newsOpenAI and Broadcom announce chip designed for LLM inference at scalenewsOpenAI and Broadcom unveil LLM-optimized inference chipnewsI mapped which local LLMs actually fit each RAM tier, 8 to 128GB (open dataset)newsHardware startup unveils inference acceleratornewsA barebones CPU-only inference engine for Qwen 3, written from scratch in pure C
