All
50 items across the graph · 21 news stories — tagged with All.
Latest news
Longcat 2 model weights have been published
https://huggingface.co/meituan-longcat/LongCat-2.0-INT8 https://huggingface.co/meituan-longcat/LongCat-2.0-FP8 submitted by /u/RhubarbSimilar1683
Read full story →More news · 20
Portugal just released their own LLM Amalia (9B)!
I didnt see any mention here. Source:
Read full story →My DeepSeek V4 Pro at home got faster again
You may remember my earlier posts about DeepSeek V4 Pro at home. Today I checked the performance in my llama.cpp branch that contains various fixes and optimizations not yet included in mai
Read full story →Micro-World - Action-controlled Interactive world model - AMD
Read full story →Pay attention: a few chats waiting in tray reserve 1GB VRAM for themselves.
If an application uses a Web-based interface and "hardware acceleration", it constructs its frame in VRAM and sometimes keeps it reserved even if the app is minimised. On my Linux machine, Discord is the worst offender, reserving 450 MB VRAM. Steam takes 200 MB, Telegram 150 MB,…
Read full story →Software developers appreciation post
Im on the bus to work and just felt like i dont see enough grattitude for the men, women, children, and people who contribute thier time and effort on open projects. Just last night i saw ive been sleeping while vllm developers are releasing 3 new major releases, and not only tha…
Read full story →Gemma 4 WebGPU Kernels 255 tok/s by x/@xenovacom
We need more of this, 100+ T/s
Read full story →They fit! Mostly.... 2x 3090, Thermaltake Core p3
Got another 3090 had to print a bracket to angle t
Read full story →Making LLMs Better at Creative Writing using Entropy
submitted by
Read full story →Why can i never stop the looping?
I constantly see people here saying Qwen3.6 35B is amazing, Ornith V1 is amazing, but i cannot use these models at all without severe looping problems. What the hell am i doing wrong?? Temp 0.6 top_p 0.95 top_k 20 min_p 0.05 rep_penalty 1.1 Using Q6 of both models with K/V at Q8,…
Read full story →README_EN.md · openpangu/openPangu-2.0-Flash at main
1. Introduction
Read full story →Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!
I've been super impressed with Krea-2-Turbo. It can generate high quality images in ~3 seconds. The quality is quite good compared to other local AI image gen models. Now, I don't want to make you watch or click a you tube video, so I'll just give these clear instructions on how…
Read full story →DeepSeek V4, PR merged into llama.cpp !
The PR : https://github.com/ggml-org/llama.cpp/pull/24162 All to git pull, cmake , and download GGUFs ! A vos marques, prêt, partez ! submitted by /u/Squik67 [link]
Read full story →InternScience/Agents-A1 · Hugging Face
Unbelievable benchmarks for a 35B MoE, somebody verify.
Read full story →CPU-only GLM 5.2: Epyc and 512GB RAM
This is just a preview of some content I'm putting together to share with you all. I have a server I've p
Read full story →Apparently you can skip entire transformer blocks at load time with minimal performance impact
The benefit is another trick to allow fitting a model that wouldn’t fit in your hardware otherwise. People currently rely on quantization, and this is just another tool that can be used for that purpose (and they can be used together as well) Following recent (very cool) papers,…
Read full story →GLM 5.2 Q1_S vs Qwen 27B Q8
TL;DR; GLM-5.2 Q1_S beats Qwen 3.6 27B Q8, both run at KV Q8 edit: GLM run a K & V Q8,
Read full story →Script to monitor llama cpp and analyze memory usage
My goal has always been to be productive with commodity hard
Read full story →Ornith 35B is great so far
Tried creating a quick 3d game with it, after 3 prompts, it got me this(checkvideo). If I compare this with qwen3.5-35b-a3b, it was not able to successfully generate this and was failing even after multiple prompts. Harness: Claude Code How is your experience so far ? https://red…
Read full story →Mythos was the first, now GPT-5.6
https://techcrunch.com/2026/06/26/openai-limits-gpt-5-6-rollout-after-government-request-says-restrictions-shouldnt-be-the-norm/ Either a hype before IPO, or they have just shot themselves in a foot. This is pretty much it for more advanced online models. Local LLM is one of the…
Read full story →Finally.. my rig is maxed out
Got all the parts before the crazy price increase except for the rtx pro 5k! Was saving up to order rtx pro 6000 in US and i
Read full story →From the graph · 29
Composio powers 1000+ toolkits, tool search, context management, authentication, and a sandboxed workbench to help you build AI agents that turn intent into act…
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
🤖 Type-safe, provider-agnostic TypeScript AI SDK for streaming chat, tool calling, agents, and multimodal apps across OpenAI, Anthropic, Gemini, React, Vue, Sv…
VCP 部署在 AI 模型 API 与前端应用之间,是面向AGI OS开发和探索的工业级基建示范项目。通过统一指令协议、多层级持久化记忆、分布式插件引擎及多 Agent 协作框架,将原本“无状态、无记忆、无工具调用能力”的大语言模型,彻底改造成拥有永久自我意识、物理世界操作权及群体协作智能的完整智能体系统。
A Python framework for self-hosted LLM tool-calling and multi-step agentic workflows
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
WFGY is heading toward WFGY 5.0 Polaris Protocol, a major open-source release for AI reasoning, RAG, agents, and real-world workflows. Includes Problem Map, Glo…
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Cut your Claude / OpenAI / Gemini bill 70–95% on AI coding. Local proxy that compresses context, keeps provider caches hot, and verifies LLM output ($0 hallucin…
High Performance Data Processing in Python
Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuni…
Self-hosted AI RAG + MCP Platform
Hallucination-prevention RAG system with verbatim span extraction. Ensures all generated content is grounded in source documents with exact citations.
A platform for end-to-end development of machine learning solutions in biomedical imaging
Anthropic Claude API wrapper for Go
c4 GenAI Suite
Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.
AI-powered NBA game outcome predictor that uses advanced team stats and trend-based features to forecast winners and track model performance
Fenix Ai Trading Bot with LangGraph and ollama and multipe providers
A living AI-agent office on your desktop wallpaper — Claude Code agents that walk, work, delegate, learn & hold meetings. Per-agent swappable models (Claude/GLM…
Inference Hub for AI at Scale
ToolRegistry: A Protocol-Agnostic Tool Management Library for Function-Calling LLMs (OpenAI, Anthropic, Gemini, LangChain, MCP)
Your codex login → a full ChatGPT Plus/Pro account (every model, deep research, image gen, code exec) inside Claude Code, Codex & any MCP client. One-line insta…
The AI research assistant that cites real sources honestly — and searches the web. Your AI research assistant that cites real sources and stays honest. Works wi…
Browser-automation agent for Chrome — natural-language tasks executed through native tool calling, scoped Skills, CDP keyboard control, and a confirm-before-act…
Native rules, hooks, and guards that prevent Claude Code and Codex from hallucinating code, duplicating files, or shipping unverified changes.
Lightweight Python SDK for LLMs with unified API across 9 providers. Built-in ReAct & Plan-Execute agents, streaming, native tool calling, context injection, st…
Web Application Firewall (WAF) for Kubernetes Gateways
