repoGitHubTrust 82 · PrimaryPublished yesterdayLive · yesterday

patrick-toulme/harnessgym

Iterative agent harness improvement: run a coding agent on a hard task, generate the reusable tooling it was missing, qualify it, and replay fresh sessions with it activated. Works with Codex and Claude Code.

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Related to

modelAgentCore-8B

Covers

newsI built an agent Harness for Small Models. I got Qwen 3.5 4b managing servers.

Implements

paperAgentic Hardware Design as Repository-Level Code Evolution paperAutoTrainess: Teaching Language Models to Improve Language Models Autonomously paperLearning from Failure: Inference-Time Self-Improvement for Computer-Use Agents

Implements (incoming)

paperReasoning effort, not tool access, buys first-try reliability in agentic code generation: an observational study

Related across the graph

newsI built an agent Harness for Small Models. I got Qwen 3.5 4b managing servers.paperLearning from Failure: Inference-Time Self-Improvement for Computer-Use Agents paperReasoning effort, not tool access, buys first-try reliability in agentic code generation: an observational study modelAgentCore-8B paperAutoTrainess: Teaching Language Models to Improve Language Models Autonomously paperAgentic Hardware Design as Repository-Level Code Evolution

Topics