Topic

Serving

15 items across the graph — tagged with Serving.

From the graph · 15

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

→repo

skypilot-org/skypilot

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).

→repo

vespa-engine/vespa

The AI search platform

→repo

tensorflow/serving

A flexible, high-performance serving system for machine learning models

→repo

volcano-sh/volcano

A Cloud Native Batch System (Project under CNCF)

→repo

ModelTC/LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed perf…

→repo

PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

→repo

thu-pacman/chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

→repo

vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

→repo

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

→repo

basetenlabs/truss

The simplest way to serve AI/ML models in production

→repo

mosecorg/mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

→repo

openvinotoolkit/model_server

A scalable inference server for models optimized with OpenVINO™

→repo

epfml/disco

DISCO is a code-free and installation-free browser platform that allows any non-technical user to collaboratively train machine learning models without sharing…

→repo

aivrar/vllm-windows-build

Native Windows build of vLLM 0.24.0 - no WSL, no Docker. Python 3.13 + CUDA 12.8 + PyTorch 2.11 cu128 for RTX 30/40/50-series, pre-built wheel, Windows patchset…

→

From the graph · 15

Related topics