2 items across the graph — tagged with Cuda Kernels.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Pure Rust + CUDA LLM inference engine — no PyTorch, OpenAI-compatible, serves Qwen3 to Kimi-K2