Topic

All

50 items across the graph · 21 news stories — tagged with All.

Latest news

NewsReddit r/LocalLLaMALive · 11m ago

Longcat 2 model weights have been published

https://huggingface.co/meituan-longcat/LongCat-2.0-INT8 https://huggingface.co/meituan-longcat/LongCat-2.0-FP8 submitted by /u/RhubarbSimilar1683

Read full story →

More news · 20

NewsReddit r/LocalLLaMALive · 11m ago

Portugal just released their own LLM Amalia (9B)!

I didnt see any mention here. Source:

Read full story →
NewsReddit r/LocalLLaMALive · 37m ago

My DeepSeek V4 Pro at home got faster again

You may remember my earlier posts about DeepSeek V4 Pro at home. Today I checked the performance in my llama.cpp branch that contains various fixes and optimizations not yet included in mai

Read full story →
NewsReddit r/LocalLLaMALive · 3h ago

Micro-World - Action-controlled Interactive world model - AMD

Read full story →
NewsReddit r/LocalLLaMALive · 7h ago

Pay attention: a few chats waiting in tray reserve 1GB VRAM for themselves.

If an application uses a Web-based interface and "hardware acceleration", it constructs its frame in VRAM and sometimes keeps it reserved even if the app is minimised. On my Linux machine, Discord is the worst offender, reserving 450 MB VRAM. Steam takes 200 MB, Telegram 150 MB,…

Read full story →
NewsReddit r/LocalLLaMALive · 18h ago

Software developers appreciation post

Im on the bus to work and just felt like i dont see enough grattitude for the men, women, children, and people who contribute thier time and effort on open projects. Just last night i saw ive been sleeping while vllm developers are releasing 3 new major releases, and not only tha…

Read full story →
NewsReddit r/LocalLLaMALive · yesterday

Gemma 4 WebGPU Kernels 255 tok/s by x/@xenovacom

We need more of this, 100+ T/s

Read full story →
NewsReddit r/LocalLLaMALive · yesterday

They fit! Mostly.... 2x 3090, Thermaltake Core p3

Got another 3090 had to print a bracket to angle t

Read full story →
NewsReddit r/LocalLLaMALive · yesterday

Making LLMs Better at Creative Writing using Entropy

submitted by

Read full story →
NewsReddit r/LocalLLaMALive · 2d ago

Why can i never stop the looping?

I constantly see people here saying Qwen3.6 35B is amazing, Ornith V1 is amazing, but i cannot use these models at all without severe looping problems. What the hell am i doing wrong?? Temp 0.6 top_p 0.95 top_k 20 min_p 0.05 rep_penalty 1.1 Using Q6 of both models with K/V at Q8,…

Read full story →
NewsReddit r/LocalLLaMALive · 2d ago

README_EN.md · openpangu/openPangu-2.0-Flash at main

1. Introduction

Read full story →
NewsReddit r/LocalLLaMALive · 3d ago

Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!

I've been super impressed with Krea-2-Turbo. It can generate high quality images in ~3 seconds. The quality is quite good compared to other local AI image gen models. Now, I don't want to make you watch or click a you tube video, so I'll just give these clear instructions on how…

Read full story →
NewsReddit r/LocalLLaMALive · 3d ago

DeepSeek V4, PR merged into llama.cpp !

The PR : https://github.com/ggml-org/llama.cpp/pull/24162 All to git pull, cmake , and download GGUFs ! A vos marques, prêt, partez ! submitted by /u/Squik67 [link]

Read full story →
NewsReddit r/LocalLLaMALive · 3d ago

InternScience/Agents-A1 · Hugging Face

Unbelievable benchmarks for a 35B MoE, somebody verify.

Read full story →
NewsReddit r/LocalLLaMALive · 4d ago

CPU-only GLM 5.2: Epyc and 512GB RAM

This is just a preview of some content I'm putting together to share with you all. I have a server I've p

Read full story →
NewsReddit r/LocalLLaMALive · 4d ago

Apparently you can skip entire transformer blocks at load time with minimal performance impact

The benefit is another trick to allow fitting a model that wouldn’t fit in your hardware otherwise. People currently rely on quantization, and this is just another tool that can be used for that purpose (and they can be used together as well) Following recent (very cool) papers,…

Read full story →
NewsReddit r/LocalLLaMALive · 4d ago

GLM 5.2 Q1_S vs Qwen 27B Q8

TL;DR; GLM-5.2 Q1_S beats Qwen 3.6 27B Q8, both run at KV Q8 edit: GLM run a K & V Q8,

Read full story →
NewsReddit r/LocalLLaMALive · 4d ago

Script to monitor llama cpp and analyze memory usage

My goal has always been to be productive with commodity hard

Read full story →
NewsReddit r/LocalLLaMALive · 5d ago

Ornith 35B is great so far

Tried creating a quick 3d game with it, after 3 prompts, it got me this(checkvideo). If I compare this with qwen3.5-35b-a3b, it was not able to successfully generate this and was failing even after multiple prompts. Harness: Claude Code How is your experience so far ? https://red…

Read full story →
NewsReddit r/LocalLLaMALive · 5d ago

Mythos was the first, now GPT-5.6

https://techcrunch.com/2026/06/26/openai-limits-gpt-5-6-rollout-after-government-request-says-restrictions-shouldnt-be-the-norm/ Either a hype before IPO, or they have just shot themselves in a foot. This is pretty much it for more advanced online models. Local LLM is one of the…

Read full story →
NewsReddit r/LocalLLaMALive · 5d ago

Finally.. my rig is maxed out

Got all the parts before the crazy price increase except for the rtx pro 5k! Was saving up to order rtx pro 6000 in US and i

Read full story →

From the graph · 29

repo
ComposioHQ/composio

Composio powers 1000+ toolkits, tool search, context management, authentication, and a sandboxed workbench to help you build AI agents that turn intent into act…

repo
kyegomez/swarms

The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai

repo
root-project/root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

repo
TanStack/ai

🤖 Type-safe, provider-agnostic TypeScript AI SDK for streaming chat, tool calling, agents, and multimodal apps across OpenAI, Anthropic, Gemini, React, Vue, Sv…

repo
lioensky/VCPToolBox

VCP 部署在 AI 模型 API 与前端应用之间,是面向AGI OS开发和探索的工业级基建示范项目。通过统一指令协议、多层级持久化记忆、分布式插件引擎及多 Agent 协作框架,将原本“无状态、无记忆、无工具调用能力”的大语言模型,彻底改造成拥有永久自我意识、物理世界操作权及群体协作智能的完整智能体系统。

repo
antoinezambelli/forge

A Python framework for self-hosted LLM tool-calling and multi-step agentic workflows

repo
Cloud-CV/EvalAI

:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI

repo
onestardao/WFGY

WFGY is heading toward WFGY 5.0 Polaris Protocol, a major open-source release for AI reasoning, RAG, agents, and real-world workflows. Includes Problem Map, Glo…

repo
uccl-project/uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

repo
juyterman1000/entroly

Cut your Claude / OpenAI / Gemini bill 70–95% on AI coding. Local proxy that compresses context, keeps provider caches hot, and verifies LLM output ($0 hallucin…

repo
bodo-ai/Bodo

High Performance Data Processing in Python

repo
raketenkater/ggrun

Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuni…

repo
dilolabs/nosia

Self-hosted AI RAG + MCP Platform

repo
KRLabsOrg/verbatim-rag

Hallucination-prevention RAG system with verbatim span extraction. Ensures all generated content is grounded in source documents with exact citations.

repo
DIAGNijmegen/rse-grand-challenge

A platform for end-to-end development of machine learning solutions in biomedical imaging

repo
liushuangls/go-anthropic

Anthropic Claude API wrapper for Go

repo
codecentric/c4-genai-suite

c4 GenAI Suite

repo
NexusGPU/tensor-fusion

Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.

repo
saccofrancesco/deepshot

AI-powered NBA game outcome predictor that uses advanced team stats and trend-based features to forecast winners and track model performance

repo
Ganador1/FenixAI_tradingBot

Fenix Ai Trading Bot with LangGraph and ollama and multipe providers

repo
bagidea/bagidea-office

A living AI-agent office on your desktop wallpaper — Claude Code agents that walk, work, delegate, learn & hold meetings. Per-agent swappable models (Claude/GLM…

repo
adrianliechti/wingman

Inference Hub for AI at Scale

repo
Oaklight/ToolRegistry

ToolRegistry: A Protocol-Agnostic Tool Management Library for Function-Calling LLMs (OpenAI, Anthropic, Gemini, LangChain, MCP)

repo
robotlearning123/gpt2agent

Your codex login → a full ChatGPT Plus/Pro account (every model, deep research, image gen, code exec) inside Claude Code, Codex & any MCP client. One-line insta…

repo
zoharbabin/web-researcher-mcp

The AI research assistant that cites real sources honestly — and searches the web. Your AI research assistant that cites real sources and stays honest. Works wi…

repo
WiseriaAI/pie-ai-agent

Browser-automation agent for Chrome — natural-language tasks executed through native tool calling, scoped Skills, CDP keyboard control, and a confirm-before-act…

repo
majiayu000/vibeguard

Native rules, hooks, and guards that prevent Claude Code and Codex from hallucinating code, duplicating files, or shipping unverified changes.

repo
MiiFlow/miiflow-agent

Lightweight Python SDK for LLMs with unified API across 9 providers. Built-in ReAct & Plan-Execute agents, streaming, native tool calling, context injection, st…

repo
networking-incubator/coraza-kubernetes-operator

Web Application Firewall (WAF) for Kubernetes Gateways

Related topics