repoGitHubTrust 82 · PrimaryPublished 17h agoLive · 16h ago
novitalabs/pegaflow
High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and SGLang.
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Covers
Implements
Related across the graph
newsOpenAI and Broadcom announce chip designed for LLM inference at scalenewsOpenAI and Broadcom unveil LLM-optimized inference chipnewsI mapped which local LLMs actually fit each RAM tier, 8 to 128GB (open dataset)paperWattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMsnewsTesla V100 16GB local LLMs, single and dual NVLink benchmarks
