The end of pre-compute: why the future of AI is on-demand

Blog_Real_time_AI_infrastructure_2400x1256_121eb774bc

Early in his career, Elliot Marx encountered every engineer’s nightmare: a "money bug." A unit mismatch between training data (dollars) and inference logic (cents) led to a catastrophic error where a system began approving every single loan application.

That scar tissue, earned while building risk infrastructure at Affirm and later co-founding Haven Money, shaped his philosophy on AI infrastructure. Today, as co-founder of Chalk, Marx is tackling the structural flaw that causes these errors: the disconnect between how data scientists work and how production systems operate.

The traditional data infrastructure paradigm has centered on pre-computation. Tools like Databricks and Snowflake excel at processing massive datasets in advance, preparing aggregations and transformations before anyone needs them. But as AI models move to the edge and inference happens in real-time, that approach reveals its limitations.

This article explores Marx's insights on why the future of AI infrastructure is real-time, how this shift counterintuitively lowers costs, and the engineering philosophy required to build it.

The problem: why pre-compute fails in the real world

Marx points out that the traditional paradigm of crunching numbers overnight breaks down when expensive data must be fetched at the exact moment of inference. This is particularly critical in high-stakes industries like fintech. Marx notes that his team now powers roughly a third of the world's debit card transactions, working with top anti-fraud platforms where P99 latencies under 20 milliseconds are non-negotiable.

The technical challenge goes beyond speed; it is about correctness and security. When you pre-compute, you are often forced to move sensitive data, such as social security numbers and transaction histories, out of secure environments to process them. Real-time architecture solves this by deploying directly into customer cloud environments, ensuring data never leaves the compliance perimeter.

Solving the 'write it twice' problem

The "money bug" Marx experienced was a symptom of a deeper issue in ML engineering known as the "write it twice" problem. Traditionally, a data scientist builds a model in a Python notebook, validates it, and then hands it off to an engineering team to reimplement in production (often in Java or C++). This handoff is where bugs are born.

Chalk’s architecture bridges this gap by adhering to a simple philosophy: meet data scientists where they live, using the languages they already know: SQL and Python.

To achieve this, they convert Python code to SQL-style expressions using abstract syntax tree (AST) analysis, then execute those expressions through a vectorized query engine built on top of Velox, Facebook's open-source query execution kernel. This allows Python code to run orders of magnitude faster than traditional interpretation while maintaining the same logical semantics. The result is a system where data scientists write code once, and it deploys to production without reimplementation. This effectively eliminates the gap where errors like unit mismatches hide.

The counterintuitive economics of real-time

One of the most surprising benefits of real-time computation is cost savings. While pre-computing everything sounds efficient, Marx explains that it often results in processing massive amounts of data that will never be used. Moving to real-time allows organizations to compute only on the data they actually need.

A recent case study with Whatnot, the largest live streaming marketplace in the US, illustrates this perfectly. Previously, Whatnot computed predictions for all users daily, then discarded about 90% of them because most users never logged in. By shifting to on-demand computation, they eliminated that waste entirely.

The cost equation becomes even more compelling when using third-party data. For anti-fraud platforms, pre-computing all possible combinations of emails, phone numbers, and social security numbers is mathematically impossible because there are "more combinations than the lifetime of the universe." Real-time computation isn't just cheaper; it is the only viable mathematical approach.

Building with incrementalism

Building infrastructure for AI requires a different development philosophy than building consumer apps. The complexity is high, and the use cases are novel. To navigate this, Marx advocates for "incrementalism" over the "stealth mode" perfectionism often seen in Silicon Valley. He believes the only way to build a great company is to ship early, get the product into a user's hands, and let their feedback dictate whether it's actually good.

This approach contrasts with the "three years in a room" model of companies like Snowflake. Chalk maintains a Slack channel with every customer, creating an overwhelming but invaluable stream of direct feedback. This creates a culture where engineers are empowered to work on what excites them, as long as it has a "vector of progress" that aligns with the company's direction.

The future is on-demand

The future of AI infrastructure is not about replacing existing data platforms, but about complementing them with real-time capabilities. As models move to the edge and inference happens at the moment of need, the ability to fetch, transform, and deliver data in milliseconds becomes as critical as the models themselves.

For engineering leaders navigating this shift, the lessons from Elliot Marx are clear: optimize for the data you actually need, eliminate the "write it twice" gap to reduce risk, and embrace incrementalism to find the right path forward.

To dive deeper into the future of real-time AI infrastructure, listen to Elliot Marx discuss these ideas in depth on the Dev Interrupted podcast.

Andrew Zigler

Andrew Zigler is a developer advocate and host of the Dev Interrupted podcast, where engineering leadership meets real-world insight. With a background in Classics from The University of Texas at Austin and early years spent teaching in Japan, he brings a humanistic lens to the tech world. Andrew's work bridges the gap between technical excellence and team wellbeing.

Connect with

Your next read

Cover image for Breaking the keyboard bottleneck: why voice is the future of AI development

The AI Productivity Platform

Features