Llm Serving
8 items across the graph — tagged with Llm Serving.
From the graph · 8
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
Community maintained hardware plugin for vLLM on Ascend
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
Native Windows build of vLLM 0.24.0 - no WSL, no Docker. Python 3.13 + CUDA 12.8 + PyTorch 2.11 cu128 for RTX 30/40/50-series, pre-built wheel, Windows patchset…
