Topic

All

50 items across the graph · 21 news stories — tagged with All.

Latest news

Longcat 2 model weights have been published

https://huggingface.co/meituan-longcat/LongCat-2.0-INT8 https://huggingface.co/meituan-longcat/LongCat-2.0-FP8 submitted by /u/RhubarbSimilar1683

Read full story →

More news · 20

NewsReddit r/LocalLLaMALive · 11m ago

Portugal just released their own LLM Amalia (9B)!

I didnt see any mention here. Source:

Read full story →

NewsReddit r/LocalLLaMALive · 37m ago

My DeepSeek V4 Pro at home got faster again

You may remember my earlier posts about DeepSeek V4 Pro at home. Today I checked the performance in my llama.cpp branch that contains various fixes and optimizations not yet included in mai

Read full story →

NewsReddit r/LocalLLaMALive · 3h ago

Micro-World - Action-controlled Interactive world model - AMD

Read full story →

NewsReddit r/LocalLLaMALive · 7h ago

Pay attention: a few chats waiting in tray reserve 1GB VRAM for themselves.

If an application uses a Web-based interface and "hardware acceleration", it constructs its frame in VRAM and sometimes keeps it reserved even if the app is minimised. On my Linux machine, Discord is the worst offender, reserving 450 MB VRAM. Steam takes 200 MB, Telegram 150 MB,…

Read full story →

NewsReddit r/LocalLLaMALive · 18h ago

Software developers appreciation post

Im on the bus to work and just felt like i dont see enough grattitude for the men, women, children, and people who contribute thier time and effort on open projects. Just last night i saw ive been sleeping while vllm developers are releasing 3 new major releases, and not only tha…

Read full story →

NewsReddit r/LocalLLaMALive · yesterday

Gemma 4 WebGPU Kernels 255 tok/s by x/@xenovacom

We need more of this, 100+ T/s

Read full story →

NewsReddit r/LocalLLaMALive · yesterday

They fit! Mostly.... 2x 3090, Thermaltake Core p3

Got another 3090 had to print a bracket to angle t

Read full story →

NewsReddit r/LocalLLaMALive · yesterday

Making LLMs Better at Creative Writing using Entropy

submitted by

Read full story →

NewsReddit r/LocalLLaMALive · 2d ago

Why can i never stop the looping?

I constantly see people here saying Qwen3.6 35B is amazing, Ornith V1 is amazing, but i cannot use these models at all without severe looping problems. What the hell am i doing wrong?? Temp 0.6 top_p 0.95 top_k 20 min_p 0.05 rep_penalty 1.1 Using Q6 of both models with K/V at Q8,…

Read full story →

NewsReddit r/LocalLLaMALive · 2d ago

README_EN.md · openpangu/openPangu-2.0-Flash at main

1. Introduction

Read full story →

NewsReddit r/LocalLLaMALive · 3d ago

Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!

I've been super impressed with Krea-2-Turbo. It can generate high quality images in ~3 seconds. The quality is quite good compared to other local AI image gen models. Now, I don't want to make you watch or click a you tube video, so I'll just give these clear instructions on how…

Read full story →

NewsReddit r/LocalLLaMALive · 3d ago

DeepSeek V4, PR merged into llama.cpp !

The PR : https://github.com/ggml-org/llama.cpp/pull/24162 All to git pull, cmake , and download GGUFs ! A vos marques, prêt, partez ! submitted by /u/Squik67 [link]

Read full story →

NewsReddit r/LocalLLaMALive · 3d ago

InternScience/Agents-A1 · Hugging Face

Unbelievable benchmarks for a 35B MoE, somebody verify.

Read full story →

NewsReddit r/LocalLLaMALive · 4d ago

CPU-only GLM 5.2: Epyc and 512GB RAM

This is just a preview of some content I'm putting together to share with you all. I have a server I've p

Read full story →

NewsReddit r/LocalLLaMALive · 4d ago

Apparently you can skip entire transformer blocks at load time with minimal performance impact

The benefit is another trick to allow fitting a model that wouldn’t fit in your hardware otherwise. People currently rely on quantization, and this is just another tool that can be used for that purpose (and they can be used together as well) Following recent (very cool) papers,…

Read full story →

NewsReddit r/LocalLLaMALive · 4d ago

GLM 5.2 Q1_S vs Qwen 27B Q8

TL;DR; GLM-5.2 Q1_S beats Qwen 3.6 27B Q8, both run at KV Q8 edit: GLM run a K & V Q8,

Read full story →

NewsReddit r/LocalLLaMALive · 4d ago

Script to monitor llama cpp and analyze memory usage

My goal has always been to be productive with commodity hard

Read full story →

NewsReddit r/LocalLLaMALive · 5d ago

Ornith 35B is great so far

Tried creating a quick 3d game with it, after 3 prompts, it got me this(checkvideo). If I compare this with qwen3.5-35b-a3b, it was not able to successfully generate this and was failing even after multiple prompts. Harness: Claude Code How is your experience so far ? https://red…

Read full story →

NewsReddit r/LocalLLaMALive · 5d ago

Mythos was the first, now GPT-5.6

https://techcrunch.com/2026/06/26/openai-limits-gpt-5-6-rollout-after-government-request-says-restrictions-shouldnt-be-the-norm/ Either a hype before IPO, or they have just shot themselves in a foot. This is pretty much it for more advanced online models. Local LLM is one of the…

Read full story →

NewsReddit r/LocalLLaMALive · 5d ago

Finally.. my rig is maxed out

Got all the parts before the crazy price increase except for the rtx pro 5k! Was saving up to order rtx pro 6000 in US and i

Read full story →

From the graph · 29

repo

ComposioHQ/composio

Composio powers 1000+ toolkits, tool search, context management, authentication, and a sandboxed workbench to help you build AI agents that turn intent into act…

Latest news

Longcat 2 model weights have been published

More news · 20

Portugal just released their own LLM Amalia (9B)!

My DeepSeek V4 Pro at home got faster again

Micro-World - Action-controlled Interactive world model - AMD

Pay attention: a few chats waiting in tray reserve 1GB VRAM for themselves.

Software developers appreciation post

Gemma 4 WebGPU Kernels 255 tok/s by x/@xenovacom

They fit! Mostly.... 2x 3090, Thermaltake Core p3

Making LLMs Better at Creative Writing using Entropy

Why can i never stop the looping?

README_EN.md · openpangu/openPangu-2.0-Flash at main

Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!

DeepSeek V4, PR merged into llama.cpp !

InternScience/Agents-A1 · Hugging Face

CPU-only GLM 5.2: Epyc and 512GB RAM

Apparently you can skip entire transformer blocks at load time with minimal performance impact

GLM 5.2 Q1_S vs Qwen 27B Q8

Script to monitor llama cpp and analyze memory usage

Ornith 35B is great so far

Mythos was the first, now GPT-5.6

Finally.. my rig is maxed out

From the graph · 29

Related topics