Topic

Llama Cpp

16 items across the graph — tagged with Llama Cpp.

From the graph · 16

repo

antoinezambelli/forge

A Python framework for self-hosted LLM tool-calling and multi-step agentic workflows

→repo

kennss/SiliconScope

Sudoless Apple Silicon system monitor (native SwiftUI GUI) with ANE / Media Engine / memory-bandwidth tracking

→repo

AtomicBot-ai/atomic-agent

Local First Ai Agent. Optimized for Local Ai models. Long context window. Proper tools callings. Runs privately on your device.

→repo

raketenkater/ggrun

Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuni…

→repo

mohitsoni48/TurboLLM

Run any local LLM engine, auto-tuned to your GPU — polished web UI + OpenAI/Anthropic-compatible API. Point Claude Code at your own machine in one command. No E…

→repo

GaeaRuiW/kube-llmops

→repo

engeldlgado/toshllm

Run large language models locally on Intel Macs with AMD GPUs — native macOS app with Metal acceleration

→repo

john-rocky/apple-silicon-llm-bench

Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models

→repo

aaronnat23/disp8ch

Self-hosted AI workspace where chat becomes visual workflows, multi-agent operations, and reviewable automations. Local memory; local or cloud models

→repo

Etamus/NeveAI

Neve AI é uma plataforma de IA local privacy-first, desenvolvida para oferecer uma experiência de alta performance na execução de LLMs, reduzindo a dependência…

→repo

mnemosyne-systems/orangu

Advanced code editor using local AI

→repo

kekzl/imp

From-scratch C++/CUDA inference engine for the NVIDIA RTX 5090 (sm_120a) — the best single-GPU backend for agentic AI: tool calling, long-context loops, reasoni…

→repo

off-grid-ai/off-grid-ai-desktop

Off Grid AI — private, on-device AI. Run open models (text, vision, image, voice) locally through one OpenAI-compatible gateway. No cloud, no accounts, no API k…

→repo

aivrar/multi-turboquant

Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU planner. Compress KV cach…

→repo

Hal0ai/hal0

Open-source self-hosted home AI inference platform for AMD Strix Halo — multi-backend slots, OpenAI-compatible gateway, Vue 3 + FastAPI + systemd.

→repo

notwitcheer/llm-bench-rig

Dual-engine (llama.cpp + vLLM) LLM benchmarking pipeline for GGUF & safetensors on NVIDIA GPUs — speed, quality, live dashboard, publishable cards.

→