Topic

Llm Inference

19 items across the graph — tagged with Llm Inference.

From the graph · 19

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

→repo

InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

→repo

lemonade-sdk/lemonade

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Z…

→repo

spiceai/spiceai

A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.

→repo

Nano-Collective/nanocoder

An open coding agent for your terminal, built by a community collective rather than a company. Bring your own model, keep your code on your machine, and owe not…

→repo

neuron-core/neuron-ai

The Agentic Framework of the PHP ecosystem to build production-ready AI driven applications. Connect components (LLMs, Tools, vector DBs, memory) to agents that…

→repo

jmaczan/tiny-vllm

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

→repo

felladrin/MiniSearch

Minimalist web-searching platform with an AI assistant that runs directly from your browser. Demo: https://felladrin-minisearch.hf.space

→repo

expectedparrot/edsl

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

→repo

Kaden-Schutt/hipfire

RDNA-native LLM inference engine in Rust.

→repo

NPC-Worldwide/incognide

Explore the unknown, build the future, own your data.

→repo

jaylfc/taOS

Self-hosted AI agent OS. Your memory, chat, agents, and files stay on hardware you own, offline by default, cloud by choice. Offline AI memory (taOSmd), self-ho…

→repo

matrixhub-ai/matrixhub

An Open-source, self-hosted AI model hub with Hugging Face compatibility, accelerating vLLM/SGLang performance.

→repo

Mobile-Artificial-Intelligence/llama_sdk

lcpp is a dart implementation of llama.cpp used by the mobile artificial intelligence distribution (maid)

→repo

NikolasEnt/ollama-webui-intel

Ollama with intel (i)GPU acceleration in docker and benchmark

→repo

AdamBien/lightmetal

Apple Silicon mlx with Zero Dependency Java

→repo

john-rocky/apple-silicon-llm-bench

Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models

→repo

kekzl/imp

From-scratch C++/CUDA inference engine for the NVIDIA RTX 5090 (sm_120a) — the best single-GPU backend for agentic AI: tool calling, long-context loops, reasoni…

→repo

aivrar/vllm-windows-build

Native Windows build of vLLM 0.24.0 - no WSL, no Docker. Python 3.13 + CUDA 12.8 + PyTorch 2.11 cu128 for RTX 30/40/50-series, pre-built wheel, Windows patchset…

→

From the graph · 19

Related topics