Serving
15 items across the graph — tagged with Serving.
From the graph · 15
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).
The AI search platform
A flexible, high-performance serving system for machine learning models
A Cloud Native Batch System (Project under CNCF)
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed perf…
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
Community maintained hardware plugin for vLLM on Ascend
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
The simplest way to serve AI/ML models in production
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
A scalable inference server for models optimized with OpenVINO™
DISCO is a code-free and installation-free browser platform that allows any non-technical user to collaboratively train machine learning models without sharing…
Native Windows build of vLLM 0.24.0 - no WSL, no Docker. Python 3.13 + CUDA 12.8 + PyTorch 2.11 cu128 for RTX 30/40/50-series, pre-built wheel, Windows patchset…
