paperarXivTrust 82 · PrimaryPublished yesterdayLive · 2h ago

WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs

Large Language Model (LLM) inference workloads are a rapidly growing contributor to data center energy consumption. Optimizing these deployments requires matching specific LLMs to the most efficient GPUs, but operators currently lack the tools to do so without exhaustively profiling each combination. While some predictive models exist, they still require profiling data and struggle to generalize to hardware unseen during training. To address this, we introduce \textit{WattGPU}, featuring two predictive models for mean GPU power draw and Inter-Token Latency (ITL). Our approach leverages only pu

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorMauricio Fadel Argerich →
WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs
Linked via arxiv authorJonathan Fürst →
WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs
Linked via arxiv authorMarta Patiño-Martínez →
WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs

Covers

newsHardware startup unveils inference accelerator newsHow NVIDIA’s Inference Software Stack Powers the Lowest Token Cost

Implements

repomosecorg/mosec repobeam-cloud/beta9 repokekzl/imp

Implements (incoming)

repothejollydev/bezaforge-infrastructure reponovitalabs/pegaflow repouccl-project/uccl repoglab-forks/nvidia/TensorRT-LLM

authored (incoming)

personMauricio Fadel Argerich personJonathan Fürst personMarta Patiño-Martínez

Related across the graph

repouccl-project/uccl personMarta Patiño-Martínez reponovitalabs/pegaflow repothejollydev/bezaforge-infrastructure newsHow NVIDIA’s Inference Software Stack Powers the Lowest Token Cost personMauricio Fadel Argerich personJonathan Fürst repobeam-cloud/beta9 repoglab-forks/nvidia/TensorRT-LLM newsHardware startup unveils inference accelerator repokekzl/imp repomosecorg/mosec

Topics

cs.LG