When Thibault Sottiaux and the Codex team at OpenAI set out to build a coding agent, they flipped the traditional product development script. Instead of starting with a roadmap or a specific use case, they started with a radical question: what happens if we build a capable agent first, and then figure out where to put it to work?
That shift in mindset, prioritizing agent capability over product endpoints, has shaped everything about how Codex approaches autonomy. Sottiaux, who works at the intersection of research and engineering, is helping to solve one of the hardest problems in tech: creating agents that act independently, perform economically valuable work, and scale predictably as models improve.
For engineering leaders navigating the rapid evolution of AI-assisted development, the lessons from Codex offer a blueprint for building systems that remain simple, scalable, and adaptable, even as the underlying models shift beneath their feet.
Unlocking true autonomy at scale
The core objective for the Codex team extends far beyond simple code generation; they are focused on building agents capable of true, generalized autonomy. As Sottiaux explains, this means looking at the entire developer workflow, not just the IDE.
"It's not just about code generation, it's about solving many other parts of the day-to-day that are actually the bottlenecks. Building that general agent is what we're after."
This framing shifts the focus from narrow use cases to broad applicability. The agent is not designed to solve one specific problem; it is designed to identify and tackle economically valuable work across diverse contexts. That means addressing bottlenecks beyond writing syntax, such as reviewing pull requests, analyzing logs, running performance tests, and even interviewing users to gather requirements.
This "agent-first" mindset drives a different kind of product thinking. Sottiaux notes that when you shift your mindset to building an agent first, you discover a remarkable number of places where it can perform economically valuable work—opportunities that are often invisible when you start with a narrow product definition.
This approach demands simplicity. The "harness," or the software scaffolding that supports the agent, must be lightweight enough to adapt as model capabilities grow. Sottiaux describes the harness as temporary infrastructure meant to be removed over time. The goal is to minimize artificial constraints that could create "capability overhang," where the harness limits the model's true potential.
Accelerating innovation through open source
Building the agent before the product requires discipline. It means resisting the urge to optimize for a single use case and instead focusing on flexibility. Sottiaux emphasizes that focusing on the agent first opens the door for "great ideas that you don't necessarily have ahead of time."
Open-sourcing the agent framework was a strategic decision to demystify these systems. The team wanted to demonstrate that building effective agents does not require mysticism or complex scaffolding; it requires well-chosen primitives and simplicity. The open-source repo serves as both a reference implementation and a platform for experimentation.
The community response has been remarkable, with over a thousand forks emerging for tasks ranging from spreadsheet editing to browser automation. However, maintaining an open-source project while preserving a clear vision requires balance. The team’s migration to Rust was a deliberate choice to prioritize scalability and reliability, reinforcing their commitment to building for millions of concurrent agents.
Interestingly, developers newer to the field often adapt more readily to this paradigm. Without legacy practices to unlearn, they embrace new workflows and innovate within the system. Sottiaux highlights one new grad on the team, Ahmed, who has become a trusted contributor by bringing fresh ideas that challenge established norms.
Scalable primitives and the risk of 'capability overhang'
The search for scalable primitives is both a research problem and an engineering challenge. Primitives must be simple, proven, and capable of scaling with model improvements. Complexity introduces fragility and bias, limiting the agent's ability to express new capabilities as models advance.
"Once you discover those primitives, they often seem delightfully simple. But the search for those primitives is a complex matter and you might not find those primitives immediately."
The concept of "capability overhang" is central to this philosophy. If the harness introduces constraints that prevent the model from expressing its full potential, then improvements in model capability won't translate to improvements in agent performance. The team mitigates this risk by aligning primitives with scaling laws and minimizing artificial constraints.
Primitives evolve through continuous refinement. For example, long-running sessions previously required heuristic-based compaction to summarize work and reset context windows. This approach was fragile and introduced bias. By training the model to handle compaction end-to-end, the team eliminated the complexity. Now, the agent can work across 20 context windows without losing track of prior work, a solution solved by the model rather than the harness.
The strategic advantage of vertical integration
Vertical integration gives the Codex team unique leverage. Because they control both the model and the harness, they can choose exactly where to solve a problem. Sottiaux notes that they don't have to fix everything in the harness; they can often decide to fix issues downstream by training new models.
This flexibility enables rapid iteration. When the team identified issues with long context windows, they didn't just build more complex harness logic; they trained the model to handle the problem natively. That approach improved end-to-end system performance and removed obsolete scaffolding.
However, vertical integration comes with trade-offs. Building a tightly coupled agent optimized for a specific model series delivers maximum performance, but it sacrifices portability. For teams working with multiple foundation models, finding common ground often means accepting performance penalties. Sottiaux advises that while primitives should remain roughly similar across models, some degree of adaptation is always necessary to unlock the best results.
Preserving the joy of programming
Building agentic autonomy is not just a technical challenge; it is a mindset shift. The Codex team's journey from complexity to simplicity, from product-first to agent-first, offers a roadmap for engineering leaders navigating the AI-assisted landscape.
The key takeaways are deceptively simple: build the agent first, discover scalable primitives, minimize scaffolding, and let the model do the work. But as Sottiaux reminds us, the search for those primitives is complex. The reward is a system that scales with model improvements and adapts to new use cases.
For developers, the advice is straightforward: make the agent your own. Build skills, automate the parts of your workflow you don't want to do, and preserve the parts that bring you joy. As Sottiaux puts it, it's like training a Pokémon: every week you teach it something new, and it gets a little better every time.
To dive deeper into the future of AI-accelerated development, listen to Thibault Sottiaux discuss these ideas in depth on the Dev Interrupted podcast.




