Blog
/
Your AI ROI question doesn't have an answer yet. Here's how to build one.

Your AI ROI question doesn't have an answer yet. Here's how to build one.

Photo of Andrew Zigler
|
unnamed_2_af26d1ace1

Your CFO wants a number. The board wants a number. You have a row of dashboards showing rising AI adoption, a growing line item for tokens and seats, and a strong sense that your engineers are moving faster. None of that adds up to the one figure leadership keeps asking for: what is the return on the money you're spending on AI.

The answer exists. You can’t read it off your current dashboards, because what you measure and what you’re asked about are two different things. Adoption counts and token consumption describe activity inside engineering. Return on investment describes the value the business recognizes on a financial statement. Without a bridge between them, every AI ROI conversation ends the same way: a confident story with no defensible number behind it.

The question is real, and most leaders can't answer it

According to Google Cloud research, 78% of executives report realizing a return on investment from at least one generative AI use case, and 88% of early adopters using agentic AI report seeing positive returns. Those are leaders who say they’re getting a return on at least one use case, not a measured, organization-wide ROI figure. That gap is where a VP of engineering gets stuck.

The rest of the market is far less settled. The 2025 Stanford AI Index reports that expectations for workforce productivity remain mixed across corporate functions, with many enterprises seeing marginal or flat outcomes after initial deployment. MIT’s NANDA project finds that internal corporate AI builds frequently fail, pushing employees toward a shadow economy of unauthorized tools. The primary barrier is organizational design, not budget or technology.

So when the board asks for your AI ROI, you're answering inside a market where some leaders report returns, many report nothing measurable, and the deciding factor is the strength of the surrounding system rather than the tool. Quoting that 88% of someone else's early adopters feel positive won't survive a second question. You need your own number, built from your own pipeline.

Why adoption metrics can't get you there

The default move is to point at the dashboards you already have. They won't carry the argument, for three reasons.

Adoption is activity, not outcome. Seat counts, prompt volume, and suggestions accepted tell you that AI is present. They say nothing about whether work moved through your system faster, broke less often, or freed engineers for higher-value work. A team can light up every adoption metric and ship at last quarter's pace.

Token and seat counts lead you into a trap. Counting tokens, counting licenses, and ranking teams by usage is tokenmaxxing: optimizing the inputs you can see while the outcomes the business pays for go unmeasured. A real ROI model replaces that leaderboard entirely. It asks what changed downstream in throughput, quality, and developer experience.

Output metrics are gameable, and the cost of generating output is collapsing. Lines of code and pull request volume rise on their own when AI writes more code, whether or not any of it reaches production or earns revenue. The 2025 Stanford AI Index reports that raw inference costs for advanced models dropped by a factor of 280 between November 2022 and October 2024. As the cost of producing output approaches zero, output volume tells you less and less, and the burden shifts to governance cost: reviewing what the machine wrote, adjusting workflows, and upskilling the people who own the result.

This is the same blind spot the Building on DORA blog walks through on the measurement side. The ROI model picks up where that leaves off and translates measured improvement into money.

The answer is a value chain

The DORA 2026 report on the ROI of AI-assisted software development builds the model this work requires. It connects engineering activity to financial return through a chain of cause and effect.

The chain runs left to right. AI adoption in the software development lifecycle is the input. Adoption alone produces little. Engineering capabilities have to turn that raw usage into delivery-metric improvement. TThose two metrics are throughput (the volume and velocity of changes moving through the system) and instability (the reliability of those changes once they ship). Those delivery improvements show up first as non-financial value, higher productivity, better developer experience, and better user experience. Only then do they convert into financial value, new revenue and cost savings. From total value you subtract the cost of implementing AI, and what remains, divided by that cost, is your ROI.

The report frames the first three links as leading indicators and the financial ones as lagging indicators. That changes how you set expectations. Throughput and quality move before revenue does. Measure only the lagging end and you'll declare failure during the months when the leading indicators are already turning.

Carry two cautions from the report into your own model. First, avoid double counting: an engineer hour reclaimed by AI can fund avoided hiring or fund more shipped features, and you can't credit it to both at full value. Second, treat the whole thing as a high-uncertainty estimate meant to start a conversation with finance, not a precise forecast. As the DORA authors put it, all models are wrong, and the value of this one is the shared structure it gives engineering and finance to argue inside.

The J-Curve is why the payback takes longer than the pitch

The path to AI value dips before it climbs, and your CFO needs to hear that from you before the numbers say it. The DORA report describes a J-Curve: a temporary dip in productivity and a period of instability during early adoption, before value accelerates. Treat the dip as the tuition cost of the transition, not a sign the investment failed.

Two forces dig that trough, and both delay the point where return crosses investment.

Start with the verification tax. The effort your engineers save on boilerplate gets spent reviewing AI-generated code that looks correct but may not be. The friction doesn’t vanish; it moves from writing to verifying. The 2025 DORA research found this cognitive load often replaces the time saved.

The instability tax is the second, and it’s the one most likely to be misread. The 2025 DORA research measured the standardized effect of AI adoption across many outcomes. The effect on software delivery instability was the second largest, behind individual effectiveness and larger than the effects on organizational performance and code quality. That’s an effect-size ranking, how strongly AI adoption moves instability relative to other outcomes. It's not a dollar amount, and it doesn't mean AI produces a negative return. The same research notes that the time developers lose to the added instability is smaller than the time they gained on individual effectiveness. The net stays positive. The instability is a real cost that arrives early and deepens the trough, which is why your payback lands later than a naive model predicts.

The trough has a structural shape too. The report cites research showing AI delivers a 35% to 40% productivity gain on simple, greenfield tasks while its impact on complex, legacy, brownfield code is often 10% or less. Those are task-level productivity gains, not ROI figures and not delivery-metric gains, but they tell you where the trough runs deepest: if most of your work is brownfield, expect a longer climb and budget for it.

Forecast both taxes explicitly. Then an early financial deficit reads to your leadership as the adoption lifecycle on schedule, not as a failed tool.

What the numbers look like when you run the model

Once you accept the J-Curve, the headline figures start to mean something.

The DORA report's worked example yields a 39% first-year ROI: roughly $11.6M in total first-year value against $8.4M in first-year investment, for a payback period of about eight months. Treat that as the report’s illustrative sample model, not an empirical average across companies. Google Cloud's data puts the average payback at about eight months too. Use the following as planning heuristics rather than measured benchmarks. Six to nine months is a reasonable target for agile teams, and 12 to 18 months fits larger enterprise rollouts that carry heavy governance overhead.

The larger figure that gets quoted needs the tightest attribution. Google Cloud customers found an average of 727% return on their investment in Google Cloud AI over three years. That's a specific three-year cohort of Google Cloud customers, not a typical or expected return for AI adoption generally. Cite it as what it is and it holds up.

Where APEX fits

A value model tells you what to connect. You still need a way to measure the engineering side of the chain continuously, so the leading indicators are real signals rather than after-the-fact guesses. That’s the role of APEX, the framework LinearB built on DORA for the AI era.

APEX makes AI a first-class contributor to the software development lifecycle and measures how that contribution affects predictability, efficiency, and experience. Its four pillars line up with the value chain your ROI model depends on. AI leverage measures AI's contribution where it counts, in the critical path at the pull request level, instead of in an adoption dashboard that never touches delivery. Predictability confirms that faster output stays plannable rather than introducing the instability the J-Curve warns about. Efficiency exposes where AI speed gets absorbed, because accelerating coding often shifts the bottleneck to review or testing. Developer experience guards against throughput bought with burnout, which is throughput you can't sustain or count.

Measured on a recommended weekly, monthly, and quarterly rhythm tuned to a typical organization, these pillars feed the leading indicators in your ROI model with evidence. The financial bridge needs both halves: the value model from finance and the continuous measurement of AI's real contribution from engineering.

Build the number before you're asked for it again

The AI ROI answer is knowable. You build it as a value chain: AI’s contribution in the critical path feeds throughput, quality, and developer experience, and those feed revenue and cost savings. Subtract the verification tax and the instability tax, and the payback timeline stays honest. Adoption metrics and token counts measure the wrong thing, and no amount of tokenmaxxing converts into a figure the board can act on.

Build that model now, while the question is still hypothetical, and you walk into the next board meeting with a defensible number and a credible timeline instead of a usage chart. Our guide, the APEX framework for engineering leaders, walks through the executive conversation, the metric-to-value mapping, and the scripts for setting expectations with finance. For the live version of the argument, our Life beyond tokenmaxxing workshop covers how to retire the activity leaderboard and start measuring return.

andrewzigler_4239eb98ca

Andrew Zigler

Andrew Zigler is a developer advocate and host of the Dev Interrupted podcast, where engineering leadership meets real-world insight. With a background in Classics from The University of Texas at Austin and early years spent teaching in Japan, he brings a humanistic lens to the tech world. Andrew's work bridges the gap between technical excellence and team wellbeing.

Connect with

Your next read