When James Everingham returned to Meta to lead Dev Infra, a thousand-person organization supporting 40,000 engineers, he discovered something counterintuitive. The real productivity gains from AI were not coming from better autocomplete. They were emerging from treating AI as sentient fabric woven throughout the entire software development lifecycle.
The result was DevMate, an agent platform that grew so organically within Meta that its agents eventually submitted 50% of all code changes. Now, as CEO of guild.ai, Everingham is bringing those lessons to engineering organizations everywhere.
Embedding agents across the software lifecycle
The initial instinct at Meta mirrored the rest of the industry: build better in-editor assistance. They achieved near feature parity with tools like Cursor through their internal Code Compose product. But the results were uneven. Junior engineers and some very senior engineers saw gains, but the broad middle of the organization experienced minimal impact.
The breakthrough came from repositioning AI from an authoring tool to an infrastructure layer. By moving agentic behavior closer to source control, where teams, tools, and processes already converge, Meta unlocked higher-leverage use cases across the development lifecycle.
The most effective adoption pattern was not mandate-driven. Instead, ambitious business challenges pushed teams to discover where agents genuinely solved problems. For example, leadership did not tell the team to build a "diff risk score" tool. Instead, they issued a challenge to the company: how do we eliminate code freeze?
This challenge-driven approach surfaced unexpected solutions. Engineers built diff risk scoring using LLMs to assess production crash risk, which enabled Meta to eliminate traditional December code freezes. Another team created self-healing infrastructure that could automatically detect crashes, write fixes, generate test cases, and land changes responsibly while minimizing the impact radius of an outage.
Testing workflows also evolved beyond scripted checks. Instead of trying to achieve 100% code coverage through traditional unit tests, teams built agents that simulated realistic user behavior. These agents would virtually walk down the street taking photos and searching for restaurants, exercising code paths that humans would actually trigger in production.
Scaling agents safely with an AI control plane
As agent usage expanded from individual experiments to organization-wide systems, Meta confronted a new challenge: governance at scale. Running agents in "single player mode," where engineers spin up Claude or other tools on their local laptops, created unacceptable security, compliance, and operational risks.
"You need a control plane to be able to manage these," Everingham explains. "You need to be able to understand what these agents are doing in your infrastructure instead of just give them access to everything and let them yolo it."
The required capabilities extended far beyond simple deployment. A production-grade AI control plane needed several critical features:
- Access management and credentials: Scoped permissions preventing agents from accessing everything.
- Rollback and versioning: The ability to quickly revert problematic agent behavior.
- Observability and debugging: Session history and execution logs for troubleshooting.
- Cost tracking: Token spend visibility across concurrent agents.
- Security guardrails: Preventing credential leakage and malware installation.
The recent Open Claude incident, where users inadvertently exposed passwords and security keys, illustrated these risks vividly. What might be tolerable on a personal laptop becomes catastrophic at enterprise scale. Everingham likens the future of agentic infrastructure to a managed software center in an enterprise that dictates what software users are allowed to install.
Ownership of this control plane necessarily spans multiple functions. Infrastructure teams handle setup and management, security teams configure guardrails, and finance teams audit token spend. The shift from single-player to multiplayer mode requires cross-functional visibility and control.
Driving gains with source control-native agents
The strongest productivity gains emerged when agents operated at the source control layer rather than in individual editors. Source control serves as the canonical source of truth where code, history, team structures, and tool integrations converge.
This architectural choice enabled agents to intercept tool connections and invoke workflows at critical junctures in the development process. Rather than assisting with individual coding tasks, agents could participate in code review, testing, deployment, and operational feedback loops.
DevMate provided a centralized platform where engineers could discover, inspect, fork, and extend agents. This created compounding leverage. Developers could see what agents were available, dig into how they worked, branch them, and build their own based on those existing models rather than starting from scratch.
An onboarding agent built by an engineer trying to familiarize himself with a new codebase exemplified this leverage. It transformed source control into an interactive interface. New engineers at Instagram could ask where camera filter code lived and receive not just file paths but system diagrams and historical context, essentially having conversations with the codebase itself.
At Meta's scale, reducing "time to first diff" by even a few days translated to hundreds of reclaimed person-years annually. This pattern repeated across use cases, from diff risk scoring to production change analysis.
Measuring AI by onboarding speed and release reliability
Measuring AI's impact on developer productivity proved more complex than anticipated. Promising demos did not automatically translate into organizational gains, and traditional metrics failed to capture meaningful outcomes.
"It's not even clear what the holy grail of metrics is, and it isn't lines of code," Everingham notes. "Some of the most productive developers delete more code than they create."
The measurement challenge stemmed from productivity being context-specific rather than universal. What mattered at Meta included onboarding speed, release reliability, and the ability to ship during constrained periods. Those priorities might differ substantially at other organizations.
Lines of code proved insufficient as a proxy. The most productive developers often simplified architectures and improved maintainability in ways that raw output metrics missed entirely. Feature velocity carried similar limitations because it failed to account for quality, sustainability, or strategic value.
Meta's approach centered on defining local success criteria first and then instrumenting accordingly. "Time to first diff" provided a concrete onboarding metric. Eliminating code freezes demonstrated operational impact. Broadening who could contribute beyond traditional engineering roles indicated the democratization of development capability.
These outcome-based metrics proved more meaningful than adoption counts or AI usage statistics. The question was not how many engineers used AI tools, but whether the organization could accomplish previously impossible things, like shipping safely during peak traffic periods or onboarding engineers in days instead of weeks.
Setting ambitious challenges to drive transformation
Everingham's advice for organizations without Meta's measurement infrastructure is to start by defining what productivity means locally. Challenge teams with ambitious goals that force creative thinking about tools and processes.
You do not achieve these goals by asking for incremental improvements. You have to put crazy challenges out to your organization and push for order-of-magnitude changes. These ambitious problems demand new approaches and naturally surface where AI can provide genuine utility.
The transition from AI as a coding assistant to AI as development infrastructure represents a fundamental shift in how engineering organizations operate. Meta's experience with DevMate demonstrates that the highest-value outcomes come from embedding agents throughout the software lifecycle.
Success requires more than deploying tools. It demands centralized governance through control planes, source control-native architectures that leverage existing context, and measurement frameworks grounded in organizational outcomes. Most critically, it requires leaders willing to set ambitious challenges that push teams to discover where agents genuinely create value.
To hear more about the future of agentic infrastructure, listen to James Everingham discuss these ideas in depth on the Dev Interrupted podcast.




