Topic

Llm Inference

19 items across the graph — tagged with Llm Inference.

From the graph · 19

repo
ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

repo
InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

repo
lemonade-sdk/lemonade

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Z…

repo
spiceai/spiceai

A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.

repo
Nano-Collective/nanocoder

An open coding agent for your terminal, built by a community collective rather than a company. Bring your own model, keep your code on your machine, and owe not…

repo
neuron-core/neuron-ai

The Agentic Framework of the PHP ecosystem to build production-ready AI driven applications. Connect components (LLMs, Tools, vector DBs, memory) to agents that…

repo
jmaczan/tiny-vllm

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

repo
felladrin/MiniSearch

Minimalist web-searching platform with an AI assistant that runs directly from your browser. Demo: https://felladrin-minisearch.hf.space

repo
expectedparrot/edsl

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

repo
Kaden-Schutt/hipfire

RDNA-native LLM inference engine in Rust.

repo
NPC-Worldwide/incognide

Explore the unknown, build the future, own your data.

repo
jaylfc/taOS

Self-hosted AI agent OS. Your memory, chat, agents, and files stay on hardware you own, offline by default, cloud by choice. Offline AI memory (taOSmd), self-ho…

repo
matrixhub-ai/matrixhub

An Open-source, self-hosted AI model hub with Hugging Face compatibility, accelerating vLLM/SGLang performance.

repo
Mobile-Artificial-Intelligence/llama_sdk

lcpp is a dart implementation of llama.cpp used by the mobile artificial intelligence distribution (maid)

repo
NikolasEnt/ollama-webui-intel

Ollama with intel (i)GPU acceleration in docker and benchmark

repo
AdamBien/lightmetal

Apple Silicon mlx with Zero Dependency Java

repo
john-rocky/apple-silicon-llm-bench

Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models

repo
kekzl/imp

From-scratch C++/CUDA inference engine for the NVIDIA RTX 5090 (sm_120a) — the best single-GPU backend for agentic AI: tool calling, long-context loops, reasoni…

repo
aivrar/vllm-windows-build

Native Windows build of vLLM 0.24.0 - no WSL, no Docker. Python 3.13 + CUDA 12.8 + PyTorch 2.11 cu128 for RTX 30/40/50-series, pre-built wheel, Windows patchset…

Related topics