Topic

Llm Serving

8 items across the graph — tagged with Llm Serving.

From the graph · 8

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

→repo

skypilot-org/skypilot

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).

→repo

PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

→repo

thu-pacman/chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

→repo

vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

→repo

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

→repo

mosecorg/mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

→repo

aivrar/vllm-windows-build

Native Windows build of vLLM 0.24.0 - no WSL, no Docker. Python 3.13 + CUDA 12.8 + PyTorch 2.11 cu128 for RTX 30/40/50-series, pre-built wheel, Windows patchset…

→

From the graph · 8

Related topics