For a decade, DORA gave engineering leaders a shared language for software delivery, built on four metrics, a clear bar for elite performance, and a research base deep enough to settle most arguments. Then AI changed the questions faster than most measurement programs could keep up. DORA’s AI research answered the new ones. The 2025 State of AI-assisted Software Development introduced the DORA AI Capabilities Model, and the 2026 ROI study put a price on it.
DORA’s central finding is that AI is an amplifier. It magnifies whatever you already have, and it can’t make a struggling engineering organization healthy on its own. So the target of measurement changes. You measure the system AI amplifies, not the AI activity on top of it.
AI amplifies the system you already have
Strong engineering systems get stronger when you add AI, and weak systems get faster at being weak. Even strong systems pay an adjustment cost. The research frames the early instability as expected tuition, a trough every adopter passes through on the way to value. Healthy systems climb out of it faster, and that recovery speed is what measurement has to track.
AI collapses the cost of producing code while everything downstream, review, testing, security, and deployment, keeps running at the speed it always has. The result is a lopsided system, fast at the front and unchanged at the back.
Picture the most common version. A team adopts an assistant, coding time drops, and developers feel faster. But the pull requests are larger and more numerous, so pickup and review time climb to absorb them. Cycle time, the end-to-end measure of how fast work reaches production, barely moves or slips backward. An adoption dashboard wouldn’t show you any of this. The amplifier relocated the constraint; it didn’t make the team faster. The only way to see that is to measure where work waits, not how much of it starts.
The 2026 DORA ROI report names the cost of that lopsidedness and puts it on the J-Curve of AI value realization, the trough teams pass through before adoption pays off. The first cost is a verification tax: developers spend the time they saved on writing code reviewing AI output that looks plausible but still has to be checked, and that load scales with the volume. The second is an instability tax. DORA found that AI adoption’s effect on delivery instability was its second-largest effect, behind only individual effectiveness. That’s an effect-size ranking, not a dollar amount, and the net still comes out positive: developers lose less time to instability than they gain in individual effectiveness. The gain is real, and it’s smaller than the headline suggests.
This is what makes measuring engineering efficiency harder, and more important, than it was a year ago. If AI amplifies your system, the job of measurement is to make the system visible, to show where the amplification helps and where it compounds a weakness. Activity metrics can’t do that. Adoption rates, tokens consumed, seats activated, and lines generated all measure effort, not outcome, and none of them tell you whether the work moved through the pipeline or stalled in a queue. To manage AI’s impact, you measure the system it’s amplifying, not the activity at the top of it.
The DORA AI Capabilities Model, and what it asks of you
DORA’s AI Capabilities Model identifies seven organizational capabilities that determine whether AI adoption turns into real value. Read them as seven things every measurement program now has to account for. Each one is a leading indicator of whether your AI investment compounds or corrodes.
A clear and communicated AI stance. Teams need to know which tools are approved, what’s allowed, and where the experimentation boundaries sit. Ambiguity here surfaces as friction and stalled adoption, and it shows in how unevenly AI appears across teams.
Healthy data ecosystems. This is the quality, accessibility, and integration of internal data. Teams that can’t locate or trust their data inherit compounding inefficiencies that no amount of model capability will fix.
AI-accessible internal data. This is whether your AI tools can reach your documentation, knowledge bases, and project history. Without that context, AI output is syntactically correct and operationally irrelevant.
Strong version control practices. Frequent, small commits and fast rollback reduce the risk of every change, and they make faster delivery safe rather than reckless.
Working in small batches. Fewer lines per change and shorter tasks keep review tractable and predictability intact, which matters more when AI inflates volume.
A user-centric focus. Connecting work to user outcomes aims velocity at value. It separates shipping faster from shipping the right things faster.
Quality internal platforms. Reliable, well-documented internal tooling is the context layer agents depend on. A weak platform produces confident, wrong code at scale.
Read together, these capabilities describe the health of the system AI is amplifying. They’re leading indicators, the signals that move before the lagging numbers an executive eventually asks about. A measurement program that tracks only the lagging numbers is always explaining the past. A program that tracks these capabilities can change the future.
From research to practice
DORA gives you a great deal. It provides the amplifier finding, the seven capabilities, and the software delivery metrics. The 2026 ROI work adds a value model that turns all of it into a financial return, plus an investment roadmap that sequences where to put the money first.
DORA is research, and research is meant to be built on. What it leaves to each engineering organization is the practice. That means deciding how AI shows up as a measured, first-class part of your day-to-day delivery, which number anchors each decision, and what you do when one of them moves the wrong way. The risk is that the work goes undone and teams fall back to the metrics that are easiest to produce. Adoption percentages and token counts fill the vacuum because they need no decision, and you end up with a measurement program that’s busy but inert, rich in dashboards and poor in changed outcomes. Building on DORA is how you keep the research from ending at a dashboard.
APEX: Building on DORA for the AI era
APEX is a framework that builds on DORA for the AI-driven era. It treats AI as a first-class contributor to your SDLC, so you can see exactly how that contribution impacts predictability, efficiency, and experience instead of inferring it from adoption counts. It’s organized around four pillars, each anchored by a single north-star metric, so every decision has one number to reason from.
AI leverage measures whether AI is moving delivery. The north star is the share of pull requests with AI contribution, measured in the delivery critical path rather than by seat count or token spend. That’s what makes AI a first-class, measured part of the system instead of an activity off to the side.
Predictability measures whether the business can rely on what engineering commits. The north star is planning accuracy and capacity accuracy, with rework and defects as the diagnostics that explain misses.
Flow efficiency measures whether work moves smoothly from start to merge. The north star is cycle time, decomposed into coding, pickup, review, and deploy time so you can locate the constraint instead of guessing at it. Change failure rate sits alongside it as the quality guardrail, so speed gains don’t arrive as instability.
Developer experience measures whether the gains are sustainable. The north star is developer satisfaction, the signal that tells you whether faster delivery is being bought with burnout. DORA’s seven AI capabilities sit here too, as diagnostics that explain why satisfaction and readiness move.
DORA’s research carries straight through. Its delivery metrics become flow efficiency and predictability, AI in the critical path becomes AI leverage, and the seven capabilities become developer-experience diagnostics. APEX doesn’t replace any of that; it gives each signal a decision to drive.
The four reviews run on a recommended rhythm rather than a fixed prescription, tuned to how a typical organization already operates. AI leverage tends to warrant weekly attention so you catch shifts in adoption and contribution early. Predictability fits against the sprint and its commitments. Flow efficiency rewards a monthly decomposition to find where velocity gains get absorbed. Developer experience is usually surveyed quarterly to confirm the gains hold. The cadence is what turns a metric into a decision, and most teams tune the intervals to their own tempo.
The last piece is intervention. When a metric slips, you act on it inside the workflow, with automation such as AI code review and policy-based workflows that absorb the verification tax instead of passing it to already-stretched reviewers. That’s the practical answer to a problem DORA’s own roadmap names, where the report calls for empowering the human in the loop to bring the verification tax down.
Consider a concrete cycle. The weekly AI-leverage review shows AI-generated pull requests growing in size and number on one team. Left alone, that surfaces a month later as climbing review time in the flow numbers, and a sprint after that as a missed commitment in the predictability review. Caught early, the response is a workflow policy that routes the oversized pull requests to the right reviewers and auto-handles the routine ones, shipped before the constraint compounds. That’s the same data DORA would have you collect, now attached to a decision and an action. A rising review queue stops being a quarterly observation and becomes something you fix while it’s still small.
Beyond the leaderboard
DORA changed how the industry talks about software delivery, and its AI research has done it again, naming the amplifier effect, the capabilities that govern it, and the cost of ignoring them. The work left to engineering leaders is to put that research to work: to measure AI as a first-class contributor to delivery and watch how it lands on predictability, efficiency, and experience.
That’s where the easy path runs out. The loudest version of AI progress right now is the leaderboard: tokens consumed, seats activated, lines generated, adoption climbing on a slide. None of it answers whether the system underneath got better. Life beyond that leaderboard is a measurement practice, one where AI is a contributor you can see and a set of outcomes you can act on, every week rather than every quarter. AI will amplify whatever system it finds. The question is whether that system is one you can see and improve before the next sprint.
Our workshop “Life beyond tokenmaxxing” is a working session on exactly that: measuring AI as a contributor you can see and act on, not a number you watch climb. Join us: https://linearb.io/event/life-beyond-tokenmaxxing




