Llm Inference
19 items across the graph — tagged with Llm Inference.
From the graph · 19
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Z…
A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.
An open coding agent for your terminal, built by a community collective rather than a company. Bring your own model, keep your code on your machine, and owe not…
The Agentic Framework of the PHP ecosystem to build production-ready AI driven applications. Connect components (LLMs, Tools, vector DBs, memory) to agents that…
Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM
Minimalist web-searching platform with an AI assistant that runs directly from your browser. Demo: https://felladrin-minisearch.hf.space
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.
RDNA-native LLM inference engine in Rust.
Explore the unknown, build the future, own your data.
Self-hosted AI agent OS. Your memory, chat, agents, and files stay on hardware you own, offline by default, cloud by choice. Offline AI memory (taOSmd), self-ho…
An Open-source, self-hosted AI model hub with Hugging Face compatibility, accelerating vLLM/SGLang performance.
lcpp is a dart implementation of llama.cpp used by the mobile artificial intelligence distribution (maid)
Ollama with intel (i)GPU acceleration in docker and benchmark
Apple Silicon mlx with Zero Dependency Java
Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models
From-scratch C++/CUDA inference engine for the NVIDIA RTX 5090 (sm_120a) — the best single-GPU backend for agentic AI: tool calling, long-context loops, reasoni…
Native Windows build of vLLM 0.24.0 - no WSL, no Docker. Python 3.13 + CUDA 12.8 + PyTorch 2.11 cu128 for RTX 30/40/50-series, pre-built wheel, Windows patchset…
