Your AI demo is a lie (and how to make it real)

Blog_Comprehensive_DORA_Guide_2400x1256_9_9b7dcb69b7

"The big insight that we came to that led us to Arcade was we dialed up determinism... Instead of letting the large language model just figure stuff out... That's when things started working really well."

For decades, the command line has been a developer's staple. But what if its future isn't to be a better terminal, but something else entirely?

We're joined by Zach Lloyd, co-founder of Warp, to discuss this groundbreaking shift in developer tooling, sharing his bold vision that the future for developers is neither the IDE nor the terminal, but a new kind of platform built for launching and orchestrating AI agents. Zach explains how Warp is re-imagining the command line as the natural entry point for this evolution, transforming it from a place where you type commands to a place where you write prompts to solve complex problems.

Zach dives into the new developer workflow, where the focus moves up a layer of abstraction from the minutiae of flags and syntax to higher-level problem-solving and guiding agents. He argues that by being the platform itself—not just an app running within it—a tool like Warp can provide a far richer and more effective user experience than traditional CLI agents. Discover the new skills developers need in this era, from problem decomposition to clearly expressing intent in natural language.

Show Notes

Learn more about Arcade: Arcade.dev
Arcade's YouTube Channel: Watch examples and walkthroughs on building agents
Connect with Alex Salazar: LinkedIn

Transcript

(Disclaimer: may contain unintentionally confusing, inaccurate and/or amusing transcription errors)

[00:00:00] Andrew Zigler: Today we're talking about how everyone is chasing gold with AI and how some people are digging and some people are selling the pickaxes, but Alex Salazar and Arcade, they're building the vault. Because as it turns out, AI that talks is easy, but AI that does something, you know, that's not too hard either. Where everything breaks down is when AI tries to do something securely. We've all seen this again and again, especially if you care about privacy, compliance, and customer trust, and Arcade isn't just handing out keys. They're helping developers build their own vaults. So AI can act on behalf of users securely without blowing a hole in your whole security model. And if you are leading an engineering team and thinking about AI in production, you know security is not an edge case.

[00:00:50] Andrew Zigler: It's a foundational question that we're going to learn more about today. So Alex, thanks for joining us on Dev Interrupted.

[00:00:56] Alex Salazar: Oh, well thanks for having me.

[00:00:59] Andrew Zigler: So [00:01:00] let's start by talking about the problem that I've defined here. You know, we, we've talked a bit about, AI and how it uses tools on Dev Interrupted. You've been talking recently on, on LinkedIn about how a lot of AI projects, they look really great in demos and they're really flashy and they, and they do this unique thing, but then you actually try to implement it with your team or take it, uh, on a larger, implementation.

[00:01:22] Andrew Zigler: And that's when something really. Hits the wall. You know what happens there, Alex?

[00:01:28] Alex Salazar: Yeah. Uh, look, I think one of the most disorienting parts of building agents versus normal software or pre agent software is that, it used to be that if you got a demo working, you were, halfway there, and then you just had to add some reliability and add some security, and it was all relatively straight shot.

[00:01:47] Alex Salazar: It was just labor. There wasn't a ton of risk. And that's not true at all in agents. And so we started actually as an agents company. We were trying to build a site reliability agent, that could diagnose a, a problem, in your environment. You get alert from Datadog and we'd [00:02:00] help you figure out what was going on.

[00:02:01] Alex Salazar: That was the original concept of the product. And we, like everybody else in agents, ran into this phenomenon where the demo's actually the easiest piece, like the demo's, like 1% of the work,

[00:02:10] Andrew Zigler: Right.

[00:02:10] Alex Salazar: once you get the demo working, like 90% of the work is getting it to be production grade. Large language models are so malleable and they're so smart, generalizable, they can fool you, right?

[00:02:23] Alex Salazar: They can get a demo working relatively easy, but going from that easy demo to something production grade, it might be an entirely different product. Like you might have to change everything. And it's not obvious from the demo. How close or far you are from production, but you're typically really far from production.

[00:02:43] Alex Salazar: so the things that get in the way, the first one is consistency and accuracy. You know, when you have a demo, you're not running evals. You know, you're not looking to see like what percentage of the time the large language model is properly selecting the right next step in the workflow or the right tool call [00:03:00] and the right parameters.

[00:03:01] Alex Salazar: You're eyeballing it and, and if, and if it happens enough times successfully, you're like, awesome demo. And with a caveat that it's a demo and whoever you show it to is like, oh my God, I can't believe you did that. That's really cool. Only to do it once in order for everybody to be really happy, but in production, it needs to happen.

[00:03:21] Alex Salazar: A lot, you know, if it's not ha if it's not to get everything right, like north of 80% and actually more likely like north 90% of the time, depending on the use case, the user experience is just gonna suck. If people aren't gonna trust it, people aren't gonna use it. And just take security and safety out of it.

[00:03:36] Alex Salazar: Just users just think it sucks. And so that's the first really, really big one. That's one of the biggest killers. Uh, to demo to production agents. The second one ends up being, uh, security and safety. Uh, and it's not even really just like a, it's not even really just like a, oh, you know, tinfoil hat, you know, ciso, large enterprise thing.

[00:03:56] Alex Salazar: Why can't chant TPT send an email? It's 2025, they still [00:04:00] can't send an email. It's a security thing. It's a safety thing. And so. Yeah, in a demo you can give it like a Google app key or you can hard code your credentials into some other service.

[00:04:10] Alex Salazar: There's all these hacks you can do in demo to make it work. But that's never what you would do in production. It might work for one single user, but it would never work for a multi-user environment. And so a lot of agents are blocked. I mean, why, why isn't there a personal assistant agent today?

[00:04:26] Alex Salazar: We've been talking about personal assistant agents for years, and there still isn't one. A lot of it is because how is the agent gonna go? Talk to your Gmail, your calendar, the airlines, how's he gonna purchase something for you? Those problems are, are largely solved by what we built, but many of them still, aren't solved yet either.

[00:04:44] Alex Salazar: And the last one ends up being, the last two are token costs.

[00:04:48] Andrew Zigler: Mm.

[00:04:48] Alex Salazar: When you're doing a demo, you're not looking at the bill, you spent 50 bucks, no big deal. It was, it was a demo 50 bucks is nothing. But then when you extrapolate to like, you know, a hundred thousand users, uh, suddenly, like that's, [00:05:00] and you know, every day the finance of the agent doesn't make any sense anymore.

[00:05:04] Alex Salazar: You'll lose money, you'll be negative margin. and then latency, people really underestimate the impact of latency. As you try and get the large lineups to be more accurate and, and smarter and, and do things better, you start stuffing in more things in the context window.

[00:05:16] Alex Salazar: Well, that's gonna drive the cost up, but it also drives up latency. And the more you do sequential thinking or any other kind of like, you know, complex workflows, latency really starts to expand and the smarter models are slower. and so all of a sudden, you know, you have this really cool operation and maybe it is really accurate, but it takes a minute or it takes, you know, 30 seconds or even 10 seconds depending on what it is.

[00:05:39] Alex Salazar: Like, depending on the use case, the user may or may not find that viable. And so it just, you know, so those are things we typically see kill projects.

[00:05:47] Andrew Zigler: Yeah, and these constraints you've kind of defined are, they're interesting because they make for a very unique space where if you have like those three or four strong constraints around like why? We don't have AI systems today. Why? They don't send [00:06:00] emails. All of these things like $50 for a demo versus, you know, millions of dollars for all your daily active users.

[00:06:05] Andrew Zigler: Like when you have all of those kinds of constraints, you kind of get tempted to sacrifice one. And so that's where you get this demo phenomenon. You sacrifice security for the sake of putting on the veneer for the other kinds of functionality. The show like, look what this cool demo does, or. Look what we're building, but then when you actually go to build it, it becomes something totally, different.

[00:06:25] Andrew Zigler: And that's also because obviously the space is moving so quickly. Like just the, uh, it was like a few months ago, on this podcast we had Kabi or Betty from AWS and they built an AI video generator tool. And he talked about that and at length about how, you know, what they started with and then what they built over the course of like the last like year.

[00:06:43] Andrew Zigler: And then what they ultimately brought to the market for like. Mom and pop business owners, right? Like that, it probably had three different forms ultimately. So that's where you get this like rapid evolution and a lot of times when we talk about this on Dev Interrupted, it seems like a team's. [00:07:00] Expectation, problem more than it is like, you know, all of those constraints, they're certainly solvable if you have the right expectations and you work about it in the right way. So, you know, what are some things that you see for teams that get it right, get it wrong about

[00:07:13] Alex Salazar: Yeah,

[00:07:14] Andrew Zigler: move past, like the demo gloss and actually start

[00:07:18] Alex Salazar: yeah. So I'll tell you our story, which is ultimately how we pivoted into what is now of the product, which is a tool execution and tool authorization system. So we started out as this SRE agent. And we were getting compounding error rates, which meant like every, you know, if you think of like a diagnostic flow, like why is service a slow, you're gonna check the databases, you're gonna check the, you can check the servers, you're gonna check to see if somebody committed code last night.

[00:07:42] Alex Salazar: There's all these checks, but like, it's almost an infinite space of things they might check And, and, and, and the diagnostic like workflows, the chains themselves can get really long. But every time you ask a large language model, any question, there's, you're rolling the dice. the [00:08:00] power of a large language model is also, its weakness, is that, you know, it is, it's, it's approximating things, you know?

[00:08:06] Alex Salazar: And so because it's approximation, it's non-deterministic, and so it can get things wrong, it can hallucinate, have errors. Every time you call it, you're rolling the dice. And if you roll a one, it fails well. If you've got really complex flows where you're calling it multiple times, you're rolling the dice more and if you roll a one once, the whole chain fails.

[00:08:27] Andrew Zigler: Right.

[00:08:28] Alex Salazar: And so we're, we are running into these problems, that was problem one comp, they're called compounding error rates. And then the second thing we were running into was, okay, well we've given this agent super user access to Datadog, again, the servers and the databases and you know, in Google works. Because

[00:08:42] Andrew Zigler: Right.

[00:08:43] Alex Salazar: it's great for a demo, but now like as we're gonna go try and make this thing actually sellable, it's gotta not do that.

[00:08:47] Alex Salazar: We have to like really scope down resources and leverage existing permissioning systems and there was no way to do that.

[00:08:53] Alex Salazar: And so we ultimately had to invent all of this stuff for ourselves to make it work. And [00:09:00] when we got it working and we started to show people a working demo, people were blown.

[00:09:04] Alex Salazar: People who knew AI were blown away. They couldn't believe that we had really consistent data plots or that we were authenticating as on behalf of me into a service and properly scope permissions from within inside the agent loop. And then we realized, oh, this is a bigger product than the agent. So we ultimately pivoted and started showing and selling people that, which is what our is now.

[00:09:27] Andrew Zigler: Yeah.

[00:09:29] Alex Salazar: my answer of like, if I were building an agent today, what would I do to get around all these things and to, to really increase the odds of me succeeding? Obviously the issue is arcade, but let's put that to the side. So, one I would, this is general software engineering best practices. Like you should be using stuff off the shelf to the degree that you can because there's so much to figure out and, and build and learn that.

[00:09:55] Alex Salazar: If you try and custom build everything from scratch, you're gonna run into a really big problem.

[00:09:59] Andrew Zigler: [00:10:00] Yeah.

[00:10:00] Alex Salazar: And so the easiest thing to start with is go grab an agent. You know, go, go, drive, go grab an agent framework, and avoid trying to build your own. And yes, the learning curves are really steep, but the really steep for a reason.

[00:10:12] Alex Salazar: Because when you custom build it yourself, you can fool yourself into thinking that you've got it because of the demo phenomenon. but all this frameworks are really complicated because they're building for production and they're not necessarily building for demo.

[00:10:25] Alex Salazar: And so by volume we largely see L graph. And then, behind it we see, open AI's, HDK,

[00:10:32] Andrew Zigler: Yes.

[00:10:33] Alex Salazar: And there's a bunch of other ones. You know, there's pedantic, there's masra, there's screw ai. But like, pick one that you think, you know, fits your requirements in production, and go and run. As you're picking those, you know, I strongly recommend like really thinking through like what production's gonna look like, not the demo, like optimize for the final state, not, not the demo.

[00:10:50] Alex Salazar: Because of this phenomenon, which I see a lot of people get stuck on, um, they'll, they'll optimize, oh, well look, I, I deployed this one because it was so easy and, and, but they're optimizing for the demo.

[00:10:59] Andrew Zigler: Right.[00:11:00]

[00:11:00] Alex Salazar: optimize for prod. I think the other thing I would say is, you've gotta start with evals.

[00:11:05] Alex Salazar: It's like you've gotta build a, a muscle of evaluations, right outta the gate. Because there's really no way you're gonna get a production grade agent that's consistent enough if you're not really good with evaluations and there's a lot of products out there that can help you. Uh, there's playing Smith, there's Brain Trust.

[00:11:22] Andrew Zigler: Yeah.

[00:11:23] Alex Salazar: a framework, very specific for tool use and tool calling. So the tools are out there, you just need to set them up and use them. And, and for people who are not native to A IML, like traditional web developers, this is very alien. But it's really critical because we're all used to deterministic tests, you know, PI tests, integration tests, unit tests, and like how do you do a test on a function where there's an infinite number of correct answers, an infinite number of incorrect answers, and an infinite number of in the middle, gray,

[00:11:54] Andrew Zigler: Right.

[00:11:55] Alex Salazar: And you can't build those you like, there is no CI system that can support that. You need [00:12:00] that, that's what evals are and it's, there's a different animal. The last thing I'll say on this topic, 'cause I talk a lot about it, uh, to people is descope the living daylights out of your project.

[00:12:11] Alex Salazar: Even teams that have picked the right stacks and have built evals and all this stuff that, like the other thing that, like where they hang themselves. Is they just, they try and do too many different things.

[00:12:20] Andrew Zigler: Yeah.

[00:12:21] Alex Salazar: stuff is complicated enough and new enough where you just increase your odd significantly.

[00:12:26] Alex Salazar: If you're trying to automate like one particular workflow first, get the win and go. And a bonus one is, picking the project that you're gonna go do, like, let's call it project selection. Like what is it you're gonna work on? What is it you're gonna, what's that? What's that workflow you're gonna automate, man?

[00:12:44] Alex Salazar: Like, that's half the battle. And I would urge people to focus on places where there are modern APIs already available because if they don't already have modern APIs to access, the complexity goes [00:13:00] through the roof like you're trying to build them. Agent that's browser based,

[00:13:03] Andrew Zigler: Mm-hmm.

[00:13:04] Alex Salazar: you can make it work.

[00:13:05] Alex Salazar: There are really good products out there like browser use and browser based and a few others, but the complexity.

[00:13:09] Andrew Zigler: lot more complexity.

[00:13:12] Alex Salazar: Lot more complexity. And so if you're, if you're gonna build an agent and you have the privilege of picking where you start. I would try and focus where there are modern a PS.

[00:13:20] Andrew Zigler: Yeah. Okay. I love how you framed it about the eval situation being so alien from like writing normal. Tests. I think that's something that rings really true. That's something that I've had to adjust to as an engineer. Moving from writing a deterministic code to evaluating how my prompts or my workflows are doing. The idea of like setting up the side-by-side benchmark is, you know, definitely new to a lot of folks, but it's a really critical part of building those tools and it's just as important as the demo. I find whenever I work on those is starting from the eval, just like how you would start from tests or start from that more deterministic way is actually gonna help.

[00:13:54] Andrew Zigler: Frame what you're looking for a lot better. You know what success is earlier, and, you know, failure [00:14:00] immediately. I, I like too, how you talk about. product moving because of opportunities, because of problems and 'cause of constraints in the space. Moving from like an SRE kind of agent, which an SRE is already solving a very complex, highly secure, very specific, like architectural infrastructural,

[00:14:18] Alex Salazar: Mm-hmm.

[00:14:36] Andrew Zigler: that was a of the days of your, and so you

[00:14:40] Alex Salazar: Yeah.

[00:14:40] Andrew Zigler: yourself and you found what you're building now at Arcade and Just to bridge it into MCPA bit and 'cause our, our listeners have, we talked about this a bit.

[00:14:49] Alex Salazar: Yeah.

[00:14:49] Andrew Zigler: some guests, even talk recently about MCP. We had CTO co-founder of a, like, first of its kind MCP agency who's like, they're out there making MCP tools for companies that are [00:15:00] using these, with agents right now.

[00:15:01] Andrew Zigler: So there's like a lot of interesting kind of conversations in this we always are kind of, going around security a bit. So I'm kind of curious to know, you know, we've talked a bit about the growing pains and about building with the tools, but you know, what, what's your advice for folks that are experimenting with tool calls as part of their workflows that they're building with their teams, especially like in their engineering world that they live in?

[00:15:25] Alex Salazar: Yeah. So yeah, right now, tool use, tool calling is all the rage, right? It is, it is the biggest bottleneck and biggest opportunity in agents, which is crazy 'cause we were deep in it before it was cool.

[00:15:41] Andrew Zigler: Oh, so you're saying I was at this problem

[00:15:43] Alex Salazar: We.

[00:15:44] Andrew Zigler: all of y'all showed up.

[00:15:45] Alex Salazar: We are the OGs. But, so it's been really awesome to see the whole community kind of come to the same realizations that we had come to when we, when made the pivot and started to build the core product. That part's awesome. So lemme contextualize it on why it's so important.

[00:15:57] Alex Salazar: Like why, why is it, why is it so, so [00:16:00] exciting for everybody? If you go back to the, what we just talked about of compounding error rates and consistency and accuracy, and if a model asks too many questions, if you're gonna hallucinate and have an error, and all this eval stuff.

[00:16:16] Alex Salazar: The big insight we had in our agent, the reason we got it working was we ultimately, agents are non-deterministic, right? Large language models are non-deterministic. That that is their power. That's also the risk. And so the big insight that we came to that led us to arcade was we dialed up determinism.

[00:16:36] Alex Salazar: There's, you know, prior, at least for us, prior to this big insight, we were all in on the large language model figuring everything out. And that wasn't working, because of error rates. And so when we, when we just had this big epiphany one day and we just started to like dial back up, the analogy I give people is.

[00:16:54] Alex Salazar: Instead of letting the large line just figure stuff out and go through its planning phases and then, you know, in multiple series of [00:17:00] planning phases, we were like, no, no, no, no, no. We're gonna give it a discreet set of multiple choice questions. We're gonna, we're gonna hand it the ti I 85 calculator and we're gonna get rid of the buttons and it can only pick from the buttons.

[00:17:10] Alex Salazar: We hearing it. And when we, when we constrained the large language model, like to that degree, that's when things started working really well. And so the core insight that led us to Arcade, which is also the core insight that everyone's now having around tool use and MCP and all these different technologies, is that if you dial up determinism and give the large language model these buttons, and these buttons represent things that can connect to the outside world.

[00:17:41] Alex Salazar: You can build all the things that we all want to build, but couldn't before.

[00:17:47] Andrew Zigler: Ah, I see. So if I understand what you're saying correctly, it's instead of being like you are a free agent that can consider these tools as part of your thought and, knowledge working process. [00:18:00] Instead it's like, okay, sit down. Here's your multiple choice exam. Each of these questions, there's five options and you need to pick the best one from each.

[00:18:09] Andrew Zigler: And, that's actually a really fascinating, kind of. Comparison to even how humans who are non-deterministic are evaluated. You could pick or use any kind of tool, but you wanna know who's the best at it. You sit them all down in an exam and you give them the same exact questions and you see who picks them all correctly.

[00:18:24] Alex Salazar: Yeah. And, and

[00:18:25] Andrew Zigler: a really interesting way of kind of hijacking the same system that works for us as non-deterministic things.

[00:18:32] Alex Salazar: I mean, and it's, it's all, it's how all of us do math, right? You know, if I gave you, if I asked you to multiply two, five digit numbers together, like you probably get it wrong. Without a calculator or without a lot of time with pencil and paper, right? But if, if I, but, but again,

[00:18:48] Andrew Zigler: you're me. If you're me, yeah.

[00:18:49] Alex Salazar: but even, even pencil and paper are tools.

[00:18:51] Alex Salazar: So if I just literally had you like, do it in your head, you would probably get it wrong. So will a large language model, but if I give you a calculator, you're gonna ace the [00:19:00] exam. So will a large language model and the impact's pretty dramatic. Like if you, if you give, you know, GPT-3 0.5 turbo, the oldest worst tool calling model out there.

[00:19:12] Alex Salazar: And you hand it a like math tools, it will outperform a thinking model, like a brand new state-of-the-art thinking model. And it'll do it much faster and much cheaper. And so the impacts are huge. Um, and those are really simple operations. Now you think of something complicated where authentication and authorization involved like, Hey, go, go read my email.

[00:19:33] Alex Salazar: Go email Bob, letting him know five minutes late. Well, now. What the large language we'll have to figure out is actually almost impossible. Take, take, take hallucination rates out of complex workflows out of it. How's it gonna handle an authorization flow like an OAuth two flow if you can't trust a model to hold keys,

[00:19:54] Alex Salazar: How are you gonna, any, any security flow without you holding a secret is really difficult. Ah, [00:20:00] we didn't know how to do it and so, so how do you allow a large language model an agent? Wrapping a large language model to go execute secure operations, like reading your email. I mean, most things that matter require some degree of security.

[00:20:14] Alex Salazar: How are you gonna do that? And the answer ended up being you're gonna do it inside the tool. Call inside the button.

[00:20:21] Andrew Zigler: Okay, so like, what does that mean for folks that are building now? Do they need to look at how they're building things differently?

[00:20:27] Alex Salazar: So when you know, let, let's, let's pretend we're building a personal assistant agent

[00:20:30] Andrew Zigler: Mm-hmm.

[00:20:31] Alex Salazar: say, Hey, go check my email. Well, the agent asking the agent to go figure out OAuth asking the agent to go infer like all the things that would need to go into that. Doesn't make any sense. Don't do that.

[00:20:44] Alex Salazar: That should be encapsulated in a tool, you know, whether it's MCP or RKA or a a, like, it doesn't, like the protocols don't matter as much. It just gonna be encapsulated in a de in a tool. And that tool can be as deterministic as you want it to be. In many cases, they're completely [00:21:00] deterministic.

[00:21:00] Andrew Zigler: Yeah.

[00:21:01] Alex Salazar: checking your email doesn't need a model to do anything.

[00:21:03] Alex Salazar: Like there are Google APIs and there's all kinds of stuff. And so you would, so when the model says, oh, I want to check email button after that, your function definition in your deterministic code. It's gonna go through a standard AU flow. It's gonna do all this different stuff, and then it's gonna go hit the right endpoints.

[00:21:23] Alex Salazar: It might be one endpoint, it might be multiple endpoints to go do the thing that you asked it to do,

[00:21:28] Andrew Zigler: Mm.

[00:21:28] Alex Salazar: and now here's what, here's the rub. This is where a lot of developers are getting stuck right now. They think, oh, well, Alex, what you're describing is just APIs. I'm just gonna take the APIs that exist and maybe put some natural language around them and I'm done.

[00:21:42] Alex Salazar: Right.

[00:21:43] Andrew Zigler: Right,

[00:21:44] Alex Salazar: Unfortunately that's not, that's not the way it works. So I'll give you a really good example. Let's say you say you wanna reply to an email, well, you're gonna give the agent a reply to email button. So it's gonna reply to email, right? Well, like there is no reply to email button. [00:22:00] Uh, there is no reply to email endpoint in Gmail.

[00:22:02] Alex Salazar: Like there's actually a series of operations you have to go do. A lot of, it's in your own code. You've gotta go.

[00:22:06] Andrew Zigler: to still exist as a set block of API calls, and you still need that orchestration, you

[00:22:11] Alex Salazar: Yeah. And, and they don't and and are completely orthogonal to a, to, to restful APIs, MCP and tools are ultimately APIs too. They're just different user, right? Like all the APIs that we know and love from the last 25 years, they're all, the user is really the developer who is wiring them into the code and they know exactly where they're gonna use 'em.

[00:22:29] Alex Salazar: They know exactly how they're gonna use them, and it works every single time the same way. But that's not how agents work. And so, rest APIs, typically are, resource based crud operations create, re update, delete. So almost every API you can think of is typically, you know, defined as a resource. And then most of the operations are create, read, update, delete.

[00:22:49] Alex Salazar: An agent is different, they think in intention based workflows. I'm gonna reply to an email. I'm gonna send an email. I'm gonna go check to see if so and so replied. And [00:23:00] those are orthogonal. And they might include zero API calls. They might include a hundred API calls. They're just workflows versus resource-based credit operations.

[00:23:11] Alex Salazar: And worse, API restful APIs, APIs, as we typically know them today, they're really what I refer to as inside out, like a Google Drive. API is gonna describe Google Drive. It's gonna explain Google Drive in a lot of detail. It has APIs to manipulate everything in Google Drive. Your agent, let's say we're having a, a sales agent and you know it has an operation to brief you before a sales call.

[00:23:36] Alex Salazar: It's gonna go check the CRM for communication. It's gonna check drive for brochures, it's gonna go check your email for something else. It doesn't give a shit about drive. What it cares about is getting the brochures and so if the button is get brochures,

[00:23:51] Andrew Zigler: That's all it cares about.

[00:23:52] Alex Salazar: that's all it cares about. But if the button instead is 37 different Google Drive operations, you're asking it to [00:24:00] hallucinate because now you're asking it to not only hit the right button, but to hit the right sequence of buttons and figure out in at inference in real time, like what the right sequence of buttons is to get to the right brochure.

[00:24:12] Alex Salazar: It can do it, but it's gonna be slower, it's gonna be more expensive, and you increase the odds that it's gonna get it wrong when you don't need to. And so part of the problem, part of the, the paradigm shift for a lot of developers is it doesn't matter who gives me an MCP server.

[00:24:28] Alex Salazar: Like Google could to Mike tomorrow have the most beautiful CP server that ever existed for Google Drive. You're still have to custom build your own because you're have to ground it in the domain and the use case of your agent, 'cause your agent's intention based workflows.

[00:24:41] Andrew Zigler: You're still gonna need something between you and that, or you're gonna ma have to modify that to fit what you're looking for.

[00:24:48] Alex Salazar: Yeah. And I think right now, you know, that's something that people are only now starting to realize as they're starting to build tools. 'cause you know, it's super early. I mean, tool MC P's only been out three months. Uh, [00:25:00] but outside of very general purpose agents, which is the minority of the population of agents, you have to custom build your own intention based workflow.

[00:25:07] Andrew Zigler: I wanna shift the conversation a bit to learning a bit more about how your team is. Kind of like orienting itself around these problems and tackling their these problems. Because like you've described in our conversation, it's very much a mindset shift in how you approach the problem, how you solve the problem with the constraints that are around you, but then also then how you test it and ultimately. You build things, uh, by taking parts of other things and, and combining them in a secure way that ultimately like specializes in on that intent. Right? You talk about like LLMs, they operate in an intent versus APIs they operate in like, uh, actions or, or endpoints, right? So when you are forming a. A team and you've, and you've brought people together to build arcade. What does it look like on the inside of an AI native company and what are some things that you do there that you think that you do differently from [00:26:00] maybe

[00:26:00] Alex Salazar: Yeah.

[00:26:00] Andrew Zigler: more other companies?

[00:26:02] Alex Salazar: Yeah. Well, I mean, it's a great question. So the first thing. Is who you hire. It all starts with hiring and who team makeup. Now because our product is ultimately, like a service bus,

[00:26:17] Andrew Zigler: Yeah,

[00:26:17] Alex Salazar: We're executing tools. We're.

[00:26:19] Andrew Zigler: button is is

[00:26:20] Alex Salazar: Yeah, right. So we're like a service bus. But, a major feature in that service bus is that we're gonna handle authentication and authorization on behalf of the agent.

[00:26:27] Alex Salazar: People call this probably their biggest feature. So because we do that for our company in particular, it's critical, that we have the intersection of three really important skill sets. For us. You have to be agent native. And that's really hard to find. There aren't a lot of people that know how to build agents, let alone that don't work at OpenAI and drop.

[00:26:47] Alex Salazar: And so my co-founder, Sam, is one of those, you know, Sam has implemented more than a hundred agents at large enterprise. And so, super lucky, that we started this company together. So somebody needs to be agent native. And really understand how these things are [00:27:00] built and go to prod, have to really, and this is, this is the hard part.

[00:27:02] Alex Salazar: Not demo native, but prod native, which is a a different game. It's a different game,

[00:27:08] Andrew Zigler: of the big takeaways I think from this, is

[00:27:10] Alex Salazar: right?

[00:27:10] Andrew Zigler: conversation from hype and from demo into prod.

[00:27:13] Alex Salazar: Yeah. But then once you get past Agent Nativeness, then it's really about what is it you're actually doing? And you gotta be expert at that too. And so for us, as an example. We have to be expert at distributed systems because we're running all of these services, all these tools and we have to be expert at off.

[00:27:32] Alex Salazar: And so, um, you know, half the team is out of Okta. We had to invent new ways of doing off, to make all this stuff work.

[00:27:39] Andrew Zigler: All

[00:27:39] Alex Salazar: Uh, and I think that's an important piece because you can't just, like, you can't just pattern match. You can't just say, oh, well it worked like this last time. We're gonna do the same over like.

[00:27:48] Alex Salazar: It will rhyme, but it won't be the same. And so that intersection of expertises is what leads you to the right innovative solutions versus the old players just adding the word agents to [00:28:00] something and then saying, well, now it's agent native.

[00:28:03] Andrew Zigler: Yeah. Yeah. I mean, I think a lot of teams and folks are seeing that as well, but I, I wanna double click on that first one you said about being AI native or being like, uh, an agent native kind of worker. I think that's really interesting. So what do you think about that?

[00:28:16] Andrew Zigler: What, what does that mean to you?

[00:28:18] Alex Salazar: I mean, that means you've put agents in production before.

[00:28:20] Andrew Zigler: Okay, so you mean like you've just made agents, you've taken the production, not necessarily that you are operating in an agentic way,

[00:28:25] Alex Salazar: if we're talking about a, a eng, like an engineering team.

[00:28:28] Andrew Zigler: Yeah, like the right, kind of like, the mindsets or the, the, the skill shifts that you're seeing in your own team of like, oh wow. Like they stand out, they're a high performer and they're a high performer because of this. And a lot of times when I have these conversations with leaders like yourself, when we kind of peel that away a bit, it's because like, oh, they built this AI agent that does this thing for them, or, oh, they, they capture their workflow in this way, which is, one of those victories.

[00:28:50] Andrew Zigler: So do you have like,

[00:28:51] Alex Salazar: Yeah. Yeah, for sure. I mean, like, I think right now everybody can I, I'm gonna sound this's gonna sound like a hot tape, but people can fool themselves into thinking that they are [00:29:00] age, that they are like, at least in an engineering capacity, that they're agent native because, you know, they use cursor or you know, they use winder for or something or,

[00:29:07] Andrew Zigler: that's what people say.

[00:29:08] Alex Salazar: That's an important skillset. Like any engineer who's not deep in that stuff is like gonna be out a job relatively soon. But, um, 'cause it's such an enormous like lift. But that's not the same thing as having built an agent that went to production. and into

[00:29:24] Andrew Zigler: Demo. Prod, not

[00:29:26] Alex Salazar: right? Like taking an agent to prod is, is a very different experience than having a conversation with cloud code.

[00:29:32] Alex Salazar: And that is what sets people apart right now in the engineering teams that are trying to build agents.

[00:29:38] Andrew Zigler: That makes sense. So, I'd be interested in knowing how that engineering group grows over time by people who have built agents and

[00:29:46] Alex Salazar: Yeah,

[00:29:46] Andrew Zigler: to prod. Right

[00:29:46] Alex Salazar: well, it's,

[00:29:47] Andrew Zigler: a small pool of people,

[00:29:48] Alex Salazar: it is a small pool of people, but you know, you don't need a huge group because, you still need everything else, right? And so, but what, what's great if you have a really good engineering culture and everybody's in the same office [00:30:00] like us. I don't know how you do this in distributed teams, so I'll pass that to somebody else, but.

[00:30:04] Alex Salazar: In a local team, the cross pollination happens really fast.

[00:30:08] Andrew Zigler: Yeah, it does.

[00:30:09] Alex Salazar: You know, the person who's the agent expert is sitting next to the person who's the auth expert, is sitting next to the person distributed as systems expert, and they all start cross pollinating. And all of a sudden the distributed assistance person is now starting to become a little bit more agent native every single day.

[00:30:21] Alex Salazar: And then all of a sudden they have an epiphany one day that even the agent builder didn't have. Uh, and vice versa, like the off person is, is is cross-pollinating with the agent builder. The operator is like, oh, well maybe we did it this way or that way. And so that's where the magic happens.

[00:30:34] Andrew Zigler: for sure. Um, We've talked a little about this with some guests before too. We had JJ Tang, CEO of Rootly, and we talked a quite a bit about their like, uh, knowledge sharing working in the open sharing team practices. This cross pollinating effect that you're talking about's really critical for being like a customer oriented engineer.

[00:30:51] Andrew Zigler: Because all of them have different aspects of what the customer's looking for or what like the product needs to be. So the, the more you put them together, uh, the better, I will say in a [00:31:00] distributed team myself, I will speak in defense of the distributed teams that we are also quite good at the cross pollination.

[00:31:06] Andrew Zigler: It's just that it happens a little differently. But I, I totally agree with you, in terms of that being really important. And you know, I, I know we're kind of starting to wrap up our conversation and, and come, near the end of some of the stuff we've talked about. But, you know, you've captured a lot of really interesting thoughts about how teams are, uh, building tools, but also how they're orienting themselves around solving those problems. You know, is there any. of lasting advice or, uh, that you'd like to give for, if there may be that there may be. They're more that traditional engineering team who doesn't have that agent native engineer somewhere within them. You know, you can't cross pollinate from nothing. So how do you really start that culture, uh, and what opportunities do you see for that kind of team?

[00:31:49] Alex Salazar: Yeah, I think the, the answer is both easy and extremely difficult. The answer is you have to start building with agents and maybe the team's not ready to start building production grade agents. [00:32:00] Maybe it starts with hackathons. Maybe it starts with, the leadership of the organization.

[00:32:05] Alex Salazar: Carving off time to letting the team play and learn the stuff is really difficult. This stuff is, you're not gonna pick it up in a day. It's gonna take you months of tinkering and playing and building to run up, run up against enough of the walls and learn. 'Cause it's ultimately a new paradigm. Probably most listeners here, don't remember the pre, you know, internet days.

[00:32:27] Alex Salazar: But a web developer was a alien creature, uh, to the traditional client server developer.

[00:32:35] Andrew Zigler: Yeah.

[00:32:36] Alex Salazar: Which is an alien creature to like, you know, the mainframe developers, right? And so, you know, we take for granted how native most of us were to web. But it was a huge paradigm shift.

[00:32:47] Alex Salazar: And similarly, right now it's a huge paradigm shift. It's a whole new set of technologies, a whole new set of vendors, a whole new ecosystem, a whole new patterns, testing's, different workflows are different, like, so the only way to [00:33:00] learn it is to start doing it. And the books, you know, there might be folks out there, they're great, but nothing.

[00:33:05] Alex Salazar: Nothing is a replacement for hands-on experience.

[00:33:08] Andrew Zigler: Yeah, you gotta get out there. If you haven't built a demo, you gotta build the demo. If you built the demo, maybe it's time to challenge yourself and think about how would that demo go to prod. I think that's a really powerful takeaway from this conversation.

[00:33:19] Alex Salazar: and when you build that demo, you should start by using arcade to make it connected so that when it goes to prod, it's easy.

[00:33:24] Andrew Zigler: I. Yes. So, uh, Alex, where can, where can folks go to learn a little more about Arcade and, and how you've solved, you're solving this problem that we've talked about.

[00:33:31] Alex Salazar: Yeah. It's a Thank you. Uh, so yes, our website is Arcade Dev.

[00:33:36] Andrew Zigler: Cool.

[00:33:36] Alex Salazar: so they should go there. We also have a YouTube channel where we have a ton of examples and walkthroughs on how to build like pretty breakthrough agents that can go interact with the outside world. Things like Google and databases and your own APIs.

[00:33:49] Alex Salazar: We show you how to custom build your own, you know, your, your own tools and addition to using our out of the box tools we've done everything we can to make it as like user friendly for traditional developers as we [00:34:00] possibly can. And I think we've, I think we've done that. So I hope you come Turkey that def.

[00:34:04] Andrew Zigler: Yeah, definitely. We'll put the, the links in the show notes so our listeners can go check it out, learn a little bit more about the problem space that you're working, working in. I think it's cool that y'all also have examples, up on YouTube. You know, I'm definitely gonna go check out the tool. Maybe there's a chance for us to collaborate on some of that stuff, I really appreciate you sitting down with us and, and walking us through, uh, some of the things. That you've been building that's been top of mind. It's been really interesting for us.

[00:34:26] Andrew Zigler: And for those that have been listening, if, if you've been listening to this conversation and you are one of those engineers who happens to have taken an AI agent to prod, we would love to hear about it, what you thought about today's conversation with Alex, but also just about your own experiences in doing that.

[00:34:39] Andrew Zigler: So please drop a comment somewhere, anywhere that you're listening to this or find us on LinkedIn. We'd love to continue that conversation. And if you're not, you're still experimenting. Well, we want to hear. From you too, because we're all on the journey together and the only way that we're gonna get there safely and securely is by building it together.

[00:34:55] Andrew Zigler: So thanks for joining us on Dev Interrupted, and we'll see you next time.

Your next listen

Cover image for The one where we vibe code holiday cards | Season 5 Finale

Dev Interrupted

The AI Productivity Platform

Features

Show Notes

Transcript

Your next listen

The one where we vibe code holiday cards | Season 5 Finale

Why engineering leadership matters more than ever

The hidden costs of pre-computing data