Skip to main content
Angestrom home
SearchPapersModelsLive AIIntelligence
Search⌕⌘K
EnterprisePricingSign in

Stay Ahead in the AI Revolution

Weekly digest — EPI pulse, top intelligence, fresh lineage. Free, no account.

Follow Angestrom
Global source network
Synced every 5 minutes

Continuous sync from primary AI sources — indexed, enriched, and queryable in real time.

arXivHugging FaceGitHubOpenAIAnthropicDeepMindReutersBBC TechHacker NewsReddit MLVerified feedsFunding
ANGESTROM

The Intelligence Layer of Humanity. Everything AI. All in One Place.

Angestrom connects every piece of the AI ecosystem — data, models, research, companies, tools, and people.

info@angestrom.comwww.angestrom.comLucknow, Uttar Pradesh, India

Product

  • AI Search
  • AI Models
  • Research Papers
  • Companies
  • News & Events
  • GitHub Explorer
  • APIs & Tools
  • Datasets
  • Benchmarks
  • Model lifecycle
  • Funding graph
  • Contributors
  • AI Agents

Resources

  • Weekly digest
  • Documentation
  • Tutorials
  • Guides
  • News
  • Help / Start
  • Community

Company

  • About
  • Contact
  • Privacy Policy
  • Terms of Service
  • Acceptable Use

Enterprise

  • Pricing
  • Workspace
  • Contact Sales

Developer

  • Developer Hub
  • API docs
  • GitHub

Learn

  • Learning Academy
  • Roadmaps
  • Glossary
  • AI for Beginners

Popular Topics

Loading topics…
View All Topics →
© 2026 Angestrom Intelligence Private Limited. All rights reserved.
English
Theme
Angestrom home
SearchPapersModelsLive AIIntelligence
Search⌕⌘K
EnterprisePricingSign in
  1. Home
  2. /Repositories
  3. /defilantech/LLMKube
Read original ↗
repoGitHubTrust 82 · PrimaryPublished yesterdayLive · 9h ago

defilantech/LLMKube

Kubernetes operator for self-hosted LLM inference across a heterogeneous GPU fleet: NVIDIA CUDA, AMD Vulkan, and Apple Silicon Metal. Runtimes: llama.cpp, vLLM, TGI, mlx-server. Multi-GPU sharding, model caching, OpenAI-compatible endpoints. Apache-2.0, run across homelab and on-prem fleets, actively developed.

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsUnderstanding dynamic resource allocation in KubernetesnewsOpenAI and Broadcom announce chip designed for LLM inference at scalenewsOpenAI and Broadcom unveil LLM-optimized inference chipnewsHow're you deploying LLMs in production now-a-days? What's the best and most affordable way? [D]newsRun NVIDIA Nemotron and OpenAI GPT OSS models on Amazon Bedrock in AWS GovCloud (US)

Covers (incoming)

newsSelf-hosted GitHub Actions runners on Lambda MicroVMs

Related across the graph

newsUnderstanding dynamic resource allocation in KubernetesnewsOpenAI and Broadcom announce chip designed for LLM inference at scalenewsRun NVIDIA Nemotron and OpenAI GPT OSS models on Amazon Bedrock in AWS GovCloud (US)newsOpenAI and Broadcom unveil LLM-optimized inference chipnewsSelf-hosted GitHub Actions runners on Lambda MicroVMsnewsHow're you deploying LLMs in production now-a-days? What's the best and most affordable way? [D]
Knowledge path·NUnderstanding dynamic resource allocation in Kubernetes→NOpenAI and Broadcom announce chip designed for LLM inference at scale→NRun NVIDIA Nemotron and OpenAI GPT OSS models on Amazon Bedrock in AWS GovCloud (US)→Rdefilantech/LLMKube

Topics

aiapple-siliconautoscalingedge-computingggufgpuhomelabinferencekuberneteskubernetes-operator

Explore

Search similar →Knowledge graph →All repos →Full intelligence feed →
Graph trust82Primary
Graph score157