Gpu
50 items across the graph — tagged with Gpu.
From the graph · 50
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular…
Open3D: A Modern Library for 3D Data Processing
Open Machine Learning Compiler Framework
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and infer…
Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it ins…
cuML - RAPIDS Machine Learning Library
On-device AI across mobile, embedded and edge for PyTorch
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Z…
One delightful Ruby framework for every major AI provider. Build AI agents, chatbots, RAG apps, and multimodal workflows in beautiful, expressive code.
Achieve state of the art inference performance with modern accelerators on Kubernetes
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwel…
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
Vendor-agnostic orchestration for training, inference and agentic workloads across NVIDIA, AMD, TPU, and Tenstorrent on clouds, Kubernetes, and bare metal.
Distributed AI Model Training and LLM Fine-Tuning on Kubernetes
Ultrafast serverless GPU inference, sandboxes, and background jobs
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Nvidia GPU exporter for prometheus using nvidia-smi binary
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
🌊 Julia software for fast, friendly, flexible, ocean-flavored fluid dynamics on CPUs and GPUs
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form bui…
Kubernetes AI Toolchain Operator
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
BioNeMo Recipes: For building and adapting AI models in drug discovery at scale
Graphics Processing Units Molecular Dynamics
cuVS - a library for vector search and clustering on the GPU
Sudoless Apple Silicon system monitor (native SwiftUI GUI) with ANE / Media Engine / memory-bandwidth tracking
Minimalist web-searching platform with an AI assistant that runs directly from your browser. Demo: https://felladrin-minisearch.hf.space
Pure Rust + CUDA LLM inference engine — no PyTorch, OpenAI-compatible, serves Qwen3 to Kimi-K2
RDNA-native LLM inference engine in Rust.
Tooling for optimized, validated, and reproducible GPU-accelerated AI runtime in Kubernetes
Persist and reuse KV Cache to speedup your LLM.
Open-source agent runtime — SSH-native isolation, eBPF egress policy, Kubernetes + LXC backends, GPU passthrough, MCP-native CLI
Insanely fast Open Source Computer Vision library for ARM and x86 devices (Up to #50 times faster than OpenCV)
Kafka-ML: connecting the data stream with ML/AI frameworks (now TensorFlow and PyTorch!)
A lightweight runtime health check for PyTorch training runs.
RapidFire AI: Rapid AI Customization from RAG to Fine-Tuning
Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.
Kubernetes operator for self-hosted LLM inference across a heterogeneous GPU fleet: NVIDIA CUDA, AMD Vulkan, and Apple Silicon Metal. Runtimes: llama.cpp, vLLM,…
Run any local LLM engine, auto-tuned to your GPU — polished web UI + OpenAI/Anthropic-compatible API. Point Claude Code at your own machine in one command. No E…
Plug-and-play homelab dashboard in one container — GPU, local-AI VRAM, Docker, systemd, host health. Built-in read-only MCP server so AI agents can explore it t…
Sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.
Self-hosted AI-powered transcription platform with speaker diarization, search, and collaboration features. Built with Svelte, FastAPI, and Docker for easy depl…
Run large language models locally on Intel Macs with AMD GPUs — native macOS app with Metal acceleration
Ollama with intel (i)GPU acceleration in docker and benchmark
(GPU accelerated) Multi-arch (linux/amd64, linux/arm64/v8) JupyterLab Python docker images. Please submit Pull Requests to the GitLab repository. Mirror of
