When Christine Yen, CEO and co-founder of Honeycomb, looks at the current wave of AI transformation sweeping through engineering organizations, she sees something familiar. "I just think of it as trying to make sense of what your systems are doing by using data," she says. Despite the hype around agents and autonomous systems, Yen's perspective is refreshingly grounded: "Guys, it's software. It's weird software. It's non-deterministic software. It's more autonomous software than we're used to."
This framing matters because it anchors the conversation where it belongs, not in abstract speculation about AI's novelty, but in the practical realities of building, shipping, and operating production systems. For engineering leaders navigating this transition, observability is the foundation that makes agentic development possible at all.
Why observability makes AI agents reliable
The challenge with AI systems isn't that they're fundamentally different from traditional software, it's that they force teams to be explicit about things they've always needed to define but could previously get away with leaving fuzzy. "These conversations around what matters and what is good are ones that I see teams having all the time now because AI forces those conversations up front," Yen explains.
This shift represents a maturation of engineering practice. When you're building an agent that will make autonomous decisions, you can't rely on implicit understanding or tribal knowledge. You need to articulate what normal behavior looks like, what success criteria matter, and how to distinguish expected variation from genuine anomalies. The agent needs rich, contextual, reliable data to make sound decisions.
"Everything falls down to a data problem. Everything falls down to where are those agents getting the information about what is normal, what is not normal, what's happening, what should be happening?"
This reframing positions observability not as a reactive debugging tool but as a proactive enabler of autonomous systems. If your agents don't have access to high-quality operational data, they can't function reliably. The instrumentation and evaluation practices that should have always been table stakes for production software become non-negotiable in an agentic world.
Rich telemetry makes AI-driven workflows reliable
The quality of your observability infrastructure directly determines what is possible with AI-driven workflows. Yen emphasizes that reliable automation depends on telemetry systems that are both fast at scale and rich enough to preserve business, product, and runtime context.
This dual requirement of speed and richness reflects a fundamental shift in how engineering teams should think about production data. Traditional observability architectures split logs, metrics, and traces into separate silos, each optimized for different query patterns but collectively creating friction and blind spots. Modern columnar stores eliminate these artificial boundaries, supporting high-cardinality metadata without prohibitive cost.
The business impact becomes clear when you move beyond infrastructure-focused metrics toward user-impacting signals. Rather than alerting on CPU utilization thresholds, teams can instrument checkout success rates, data freshness, or other indicators that directly reflect customer experience. Yen notes that if an e-commerce platform's checkout rate drops and sustains that drop for a specific amount of time, that is the signal worth acting on.
The most transformative aspect of rich telemetry is how it closes the feedback loop between production behavior and future development. Yen references a blog post by Chad Fowler titled "Production Is a Compiler Input," which captures this shift perfectly. Real-world operational data should not just inform incident response. It should feed directly into how agents reason about system behavior and how engineers write the next iteration of code.
Taking these production signals and having them be a part of how the agent makes decisions about writing future code is part of how teams perhaps should have always worked, but it is certainly how they should be working now. This approach treats production evidence as a first-class input to the development process, not an afterthought. When your codebase is growing exponentially thanks to AI-assisted development, the code itself becomes less reliable as a source of truth. What matters is the observable impact of that code in production.
Collaborative investigation makes incident response faster
One of Yen's most compelling insights is that debugging has always been collaborative, even when it feels solitary. Debugging is inherently a collaborative process, even if it is just present you, past you, and future you. Past you wrote code and made decisions. Present you is trying to understand those decisions and mitigate a problem. Future you will need to learn from this investigation.
Yen says she continues to be bullish on humans. She believes humans will always bring some context, judgment, and lateral thinking that LLMs simply are not capable of. This human-in-the-loop philosophy shapes how Honeycomb approaches agent-assisted investigation. Rather than promising fully automated remediation, Yen favors systems where agents help explore hypotheses while humans contribute judgment, prioritization, and situational awareness. The goal is not to remove humans from the loop but to amplify their effectiveness.
The collaboration extends beyond urgent incident response. Observability workflows support onboarding, knowledge transfer, and routine exploratory analysis. When investigative paths and cues are captured within the tool itself, future responders can build on prior learning instead of starting from scratch. This creates a compounding advantage where each investigation makes the next one faster and more effective.
Emerging workflows will center on shared investigation surfaces where humans and agents can compare findings, branch into parallel analyses, and retain useful context for later review. The naive approach of documenting everything in runbooks and feeding them to agents misses the dynamic, exploratory nature of real debugging. Better to create systems where agents can help you explore multiple hypotheses simultaneously, connecting dots you might not have seen, while you provide the strategic direction and business context.
SLOs align reliability work with customer outcomes
Service Level Objectives represent a critical bridge between technical instrumentation and business outcomes. Yen encourages teams to define what good service looks like in business terms before deciding what to alert on, what to page for, or what to prioritize operationally.
This goes beyond reducing alert noise, though that is a welcome side effect. SLO-driven engineering fundamentally changes how teams think about reliability. Instead of reacting to infrastructure thresholds that may or may not matter to users, teams align around meaningful service contracts and customer experience indicators. The fundamental questions become: what does delivering great service to our customers look like, and what are the signals that really matter to the business where an engineer should be paged if something looks off?
Healthy SLO processes create faster feedback loops and enable teams to make changes more confidently. When you know which signals truly matter, you can evaluate the impact of changes sooner and with greater precision. This transforms observability from a risk-control function into a development accelerator.
The organizational implications run deep. Teams that have invested in strong observability practices routinely reference production evidence in engineering decisions and pull requests. One metric Yen highlights is tracking how many GitHub issues or pull requests include Honeycomb URLs. This indicates engineers are updating their mental models based on production reality and building to the conditions they can actually observe.
Observability tools that enable faster feedback loops, more confident development, and more frequent changes should be moved into the profit center and viewed as powerful accelerants.
Shared observability gives every team better answers faster
The traditional boundary between development and operations has always been artificial, but AI is making it untenable. Yen explains that all of Honeycomb is about reducing the gap between development and production, building a shared language, and establishing a shared understanding of how system changes impact users.
What is new is how quickly this shared understanding is expanding beyond engineering. Support, product, and sales teams increasingly need access to operational data to understand customer behavior and prepare for conversations. At Honeycomb, the head of sales is one of the heaviest users of their internal MCP, utilizing production data to understand customer usage patterns before entering conversations.
Natural-language interfaces and MCP-style access have eliminated the skill barrier that previously limited observability tools to engineers. Sales teams at a leading inference provider use Honeycomb to check customer usage patterns before calls. Product teams run gut checks on feature adoption. Marketing teams assess customer health. The source of truth about system behavior becomes a source of truth about business reality.
This democratization raises the bar for tool design. When questions come from non-technical users in informal language, the system must return trustworthy answers despite incomplete or imprecise queries. The burden shifts from the user learning specialized query syntax to the vendor ensuring data quality and interpretation remain sound.
The power of better questions
Yen reframes the key organizational skill as asking better questions rather than mastering specialized data analysis. She believes it is our job to ask good questions. When the cost of answering a question drops dramatically, teams can ask more questions, build intuition faster, and explore more hypotheses. This creates a flywheel: more questions lead to better questions, which lead to more confident decisions and faster learning.
The transformation Yen describes is not really about AI at all. It is about finally building the feedback loops and shared understanding that engineering organizations have always needed. AI simply makes the cost of not having those systems prohibitively high. Teams that invest in rich telemetry, collaborative investigation practices, SLO-driven processes, and cross-functional access to production data are not just preparing for an agentic future. They are building the foundation for faster, more confident, more customer-focused engineering today.
To hear more of Christine Yen's thoughts on AI workflows, the power of rich telemetry, and why human intuition still matters, listen to her full episode on the Dev Interrupted podcast.




