Harness engineering: leveraging Codex in an agent-first world

Source openai.com/index/harness-engineering Published May 19, 2026

A small engineering team built a production software product with zero manually-written code, using Codex agents to write every line, achieving roughly 10x velocity by focusing human effort on design, feedback loops, and scaffolding rather than coding.

The team built a production software product entirely through Codex-generated code, with humans never manually writing any code.
Engineers shifted from writing code to designing environments, specifying intent, and building feedback loops that enable Codex agents to work reliably.
Agent-first development requires making the repository and tools legible to agents, using structured documentation and progressive disclosure via progressive disclosure.
To maintain coherence in an agent-generated codebase, the team enforces invariants through custom linters, strict architectural layers, and garbage collection processes.
Through investment in agent tooling and feedback loops, Codex can now end-to-end drive new features, including bug reproduction, fix implementation, and auto-merge.

The experiment: building with zero human-written code

Over five months, a small team built an internal beta product with 0 lines of manually-written code. Every line—including application logic, tests, CI, documentation, and tooling—was written by Codex. The team estimates this took about 1/10 the time of hand-coding. Humans steered, agents executed.

The constraint forced the team to understand what changes when engineers no longer write code: they design environments, specify intent, and build feedback loops that allow Codex agents to do reliable work.

Redefining the engineer's role

Early progress was slower than expected because the environment was underspecified. The primary job became enabling agents: breaking down goals, prompting agents to build blocks, and asking "what capability is missing?" when something failed.

Humans interact almost entirely through prompts. To drive a PR to completion, Codex self-reviews and iterates until all agent reviewers are satisfied—a Ralph Wiggum Loop. Over time, review effort shifted agent-to-agent.

Making code and context legible to agents

As throughput increased, human QA became the bottleneck. The team made the app, logs, and metrics directly legible to Codex—for example, using Chrome DevTools Protocol so agents could reproduce bugs and validate fixes. Agents query logs with LogQL and metrics with PromQL, enabling prompts like "ensure service startup completes in under 800ms."

Context management is key. Instead of a giant instruction manual, the repository uses a short AGENTS.md as a map with pointers to a structured docs/ directory. This enables progressive disclosure: agents start with a small entry point and learn where to look next. Linters and CI validate the knowledge base is up to date.

Enforcing architecture and managing entropy

To keep the codebase coherent, the team enforces strict architectural layers with custom linters—"golden principles" that are encoded once and apply everywhere. Human taste is fed back into the system continuously via review comments and refactoring PRs.

Even with agents, entropy creeps in. The team introduced a recurring cleanup process—garbage collection—where background tasks scan for deviations, update quality grades, and open targeted refactoring PRs. Technical debt is paid down continuously in small increments.

Read this at any depth.

Install Depth and pick your level — Glance for a sentence, Summary for the gist, Read for the full take. Free daily quota, no signup needed.

Add to Chrome

11 views