newsReddit r/LocalLLaMATrust 52 · CommunityPublished 23h agoLive · 11h ago

llamacpp patch - DeepSeek V4 Flash running with full 1M token context locally on RTX 5090

Wanted to try running DeepSeek V4 Flash locally but found it asking for absurd amounts of VRAM at higher context lengths (~256GB at 1M). Turned out the DSA lightning indexer lacks proper llamacpp support. Did a bit of digging and there's an upstream PR to address the issue (shoutout u/fairydreaming , PR #24231 ), but even there it's not wired int