Topic

Serving

15 items across the graph — tagged with Serving.

From the graph · 15

repo
ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

repo
skypilot-org/skypilot

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).

repo
vespa-engine/vespa

The AI search platform

repo
tensorflow/serving

A flexible, high-performance serving system for machine learning models

repo
volcano-sh/volcano

A Cloud Native Batch System (Project under CNCF)

repo
ModelTC/LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed perf…

repo
PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

repo
thu-pacman/chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

repo
vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

repo
alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

repo
basetenlabs/truss

The simplest way to serve AI/ML models in production

repo
mosecorg/mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

repo
openvinotoolkit/model_server

A scalable inference server for models optimized with OpenVINO™

repo
epfml/disco

DISCO is a code-free and installation-free browser platform that allows any non-technical user to collaboratively train machine learning models without sharing…

repo
aivrar/vllm-windows-build

Native Windows build of vLLM 0.24.0 - no WSL, no Docker. Python 3.13 + CUDA 12.8 + PyTorch 2.11 cu128 for RTX 30/40/50-series, pre-built wheel, Windows patchset…

Related topics