Topic

Gpu

50 items across the graph — tagged with Gpu.

From the graph · 50

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular…

→repo

isl-org/Open3D

Open3D: A Modern Library for 3D Data Processing

→repo

apache/tvm

Open Machine Learning Compiler Framework

→repo

triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

→repo

skypilot-org/skypilot

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).

→repo

NVIDIA/DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and infer…

→repo

Andyyyy64/whichllm

Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it ins…

→repo

rapidsai/cuml

cuML - RAPIDS Machine Learning Library

→repo

pytorch/executorch

On-device AI across mobile, embedded and edge for PyTorch

→repo

lemonade-sdk/lemonade

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Z…

→repo

crmne/ruby_llm

One delightful Ruby framework for every major AI provider. Build AI agents, chatbots, RAG apps, and multimodal workflows in beautiful, expressive code.

→repo

llm-d/llm-d

Achieve state of the art inference performance with modern accelerators on Kubernetes

→repo

NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwel…

→repo

thu-pacman/chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

→repo

NVIDIA/physicsnemo

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods

→repo

dstackai/dstack

Vendor-agnostic orchestration for training, inference and agentic workloads across NVIDIA, AMD, TPU, and Tenstorrent on clouds, Kubernetes, and bare metal.

→repo

kubeflow/trainer

Distributed AI Model Training and LLM Fine-Tuning on Kubernetes

→repo

beam-cloud/beta9

Ultrafast serverless GPU inference, sandboxes, and background jobs

→repo

tenstorrent/tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.

→repo

utkuozdemir/nvidia_gpu_exporter

Nvidia GPU exporter for prometheus using nvidia-smi binary

→repo

uccl-project/uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

→repo

uxlfoundation/scikit-learn-intelex

Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

→repo

CliMA/Oceananigans.jl

🌊 Julia software for fast, friendly, flexible, ocean-flavored fluid dynamics on CPUs and GPUs

→repo

NVIDIA/raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form bui…

→repo

kaito-project/kaito

Kubernetes AI Toolchain Operator

→repo

mosecorg/mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

→repo

NVIDIA-BioNeMo/bionemo-recipes

BioNeMo Recipes: For building and adapting AI models in drug discovery at scale

→repo

brucefan1983/GPUMD

Graphics Processing Units Molecular Dynamics

→repo

NVIDIA/cuvs

cuVS - a library for vector search and clustering on the GPU

→repo

kennss/SiliconScope

Sudoless Apple Silicon system monitor (native SwiftUI GUI) with ANE / Media Engine / memory-bandwidth tracking

→repo

felladrin/MiniSearch

Minimalist web-searching platform with an AI assistant that runs directly from your browser. Demo: https://felladrin-minisearch.hf.space

→repo

openinfer-project/openinfer

Pure Rust + CUDA LLM inference engine — no PyTorch, OpenAI-compatible, serves Qwen3 to Kimi-K2

→repo

Kaden-Schutt/hipfire

RDNA-native LLM inference engine in Rust.

→repo

NVIDIA/aicr

Tooling for optimized, validated, and reproducible GPU-accelerated AI runtime in Kubernetes

→repo

ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

→repo

FootprintAI/Containarium

Open-source agent runtime — SSH-native isolation, eBPF egress policy, Kubernetes + LXC backends, GPU passthrough, MCP-native CLI

→repo

DoubangoTelecom/compv

Insanely fast Open Source Computer Vision library for ARM and x86 devices (Up to #50 times faster than OpenCV)

→repo

ertis-research/kafka-ml

Kafka-ML: connecting the data stream with ML/AI frameworks (now TensorFlow and PyTorch!)

→repo

traceopt-ai/traceml

A lightweight runtime health check for PyTorch training runs.

→repo

RapidFireAI/rapidfireai

RapidFire AI: Rapid AI Customization from RAG to Fine-Tuning

→repo

NexusGPU/tensor-fusion

Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.

→repo

defilantech/LLMKube

Kubernetes operator for self-hosted LLM inference across a heterogeneous GPU fleet: NVIDIA CUDA, AMD Vulkan, and Apple Silicon Metal. Runtimes: llama.cpp, vLLM,…

→repo

mohitsoni48/TurboLLM

Run any local LLM engine, auto-tuned to your GPU — polished web UI + OpenAI/Anthropic-compatible API. Point Claude Code at your own machine in one command. No E…

→repo

SikamikanikoBG/homelab-monitor

Plug-and-play homelab dashboard in one container — GPU, local-AI VRAM, Docker, systemd, host health. Built-in read-only MCP server so AI agents can explore it t…

→repo

gammahazard/locate-anything

Sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.

→repo

attevon-llc/OpenTranscribe

Self-hosted AI-powered transcription platform with speaker diarization, search, and collaboration features. Built with Svelte, FastAPI, and Docker for easy depl…

→repo

engeldlgado/toshllm

Run large language models locally on Intel Macs with AMD GPUs — native macOS app with Metal acceleration

→repo

NikolasEnt/ollama-webui-intel

Ollama with intel (i)GPU acceleration in docker and benchmark

→repo

b-data/jupyterlab-python-docker-stack

(GPU accelerated) Multi-arch (linux/amd64, linux/arm64/v8) JupyterLab Python docker images. Please submit Pull Requests to the GitLab repository. Mirror of

→

From the graph · 50

Related topics