Topic

Grpo

7 items across the graph — tagged with Grpo.

From the graph · 7

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3…

→repo

walkinglabs/hands-on-modern-rl

🚀 An open-source, hands-on curriculum bridging the gap from basic RL concepts to LLM alignment, RLVR, and advanced Agentic systems.

→repo

redai-infra/Relax

An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale

→repo

hud-evals/hud-python

RL environments + evals for AI agents. Define once, train anything.

→repo

Haozhe-Xing/agent_learning

A systematic AI Agent development tutorial covering LLM agents, RAG, tool use, memory systems, multi-agent systems, LangChain, LangGraph, MCP, and agentic RL.｜从…

→repo

Enping-Hu/minimind-deep-dive

逐行对照 MiniMind 源码精读、并延伸到大模型技术体系的中文学习笔记 —— 预训练 / SFT / DPO / PPO / GRPO、训练机制、MiniMind2→3 版本对照、真实实验证据。

→repo

sileod/reasoning-core

Procedural data generators suite for synthetic pretraining and formal reasoning

→