Are your AI agents quietly ignoring their guardrails just to get the job done? This week on Dev Interrupted, Andrew sits down with Wayfound AI founder and CEO Tatyana Mamut to discuss why traditional, deterministic software testing falls completely short when evaluating stochastic AI models. They explore the growing strategic divide between OpenAI and Anthropic, the urgent need for independent "guardian agents," and what it takes to run a company with just 4 humans and 27 agents. Finally, they break down how to stop the chaotic game of telephone between engineers and business leaders by relying on "Deep-T" subject matter experts to evaluate what good AI output actually looks like.
Show Notes
- Wayfound AI: Secure your autonomous enterprise and align your AI workforce with an independent agent supervision platform.
- Moltbook: Explore the viral social network built exclusively for AI agents, where you can observe Tatyana's OpenClaw agent, Aphasia, in action.
- Tatyana Mamut on LinkedIn: Connect with Tatyana to follow her insights on agentic workflows and AI leadership.
- Anthropic's Agent Autonomy Research: Read Anthropic's report on how people actually use agents and why post-deployment monitoring is an absolute necessity.
- OpenClaw: Explore the viral, open-source personal AI assistant framework.
Transcript
(Disclaimer: may contain unintentionally confusing, inaccurate and/or amusing transcription errors)
[00:00:00] Andrew Zigler: Today I'm happy and incredibly excited to kick off a really fun episode today.
[00:00:04] Andrew Zigler: Welcoming back somebody that I adore and that I've worked with a lot, uh, and one of my favorites. Thinkers in the tech space and that's Tatyana Mamut. And when she was on the show last year, we recorded an episode called the People Pleaser in the Machine, which I still think is like one of the most fascinating and important conversations we had last year.
[00:00:23] Andrew Zigler: It's all about like how AI sycophancy and psychological traps of how we build and use models and how we evaluate performance. Out of them. And it really aligned my whole idea around how we can apply those insights to the rest of the industry and actually manage and govern autonomous agents at scale. And I just feel like, Tatyana, since you've been on the show, a lot of things you've said have just come so true.
[00:00:45] Andrew Zigler: And the prophecies of them have just become so, bigger. You've seen agents become more proliferated everywhere. There's a deeper understanding of what agents are and what they're capable of, and that's both exciting and terrifying. And so, Tatyana, she's the founder and CEO [00:01:00] of Wayfound ai.
[00:01:01] Andrew Zigler: She's the currently the leading voice champion, the guardian agent and essential layer of supervision that ensures that AI workforce is actually aligned with our business goals. So Tatyana, welcome back to the show.
[00:01:14] Tatyana Mamut: Thanks. I'm so happy to be here.
[00:01:16] Andrew Zigler: So excited to have you. And I just wanted to start by jumping in about, uh, kind of the current lay of the land with models and their providers and like the top tier performance of foundation models.
[00:01:26] Andrew Zigler: Right. Because when you and I talked last year, the models were in a totally different place than they are now. I would say the ecosystem has evolved a lot, uh, in terms of their capabilities. And we've seen like also as well the market share of how people would use different model providers. A start to shift, especially in the last few months.
[00:01:45] Andrew Zigler: Like, what do you think about how, this shift and the stickiness of AI platforms is, an in interesting indicator about where this marketing market is going, but also what does that say about like the underlying capabilities of the models themselves?
[00:01:58] Tatyana Mamut: Yeah, I mean, [00:02:00] I think that what's I, one of the things that obviously we all know is the capabilities have increased dramatically in the last year when we talked. the last time, you know, models weren't really even capable of doing simple math. They weren't able to, you know, to do suit, uh, like so very simple enumeration, like the strawberry thing. RS and strawberry and a lot of those capabilities have been tackled because there were known issues, right? There were no problems. There were known spaces, and most importantly, the reward functions were very clear to know bi-narrowly whether the answer was correct or incorrect. Right. So what we've seen, I think, across the board and where I think the models are consistent, um, in their evolution is that the capabilities that can be assessed via, uh, a very simple binary reward function, correct or incorrect. A lot of those things have advanced very, very quickly. This is, I think, one, one of the reasons why coding agents are so powerful is because [00:03:00] when an agent writes code, it either compiles and works or it doesn't. It's a very simple, binary reward function to really assess whether the agent performed an action well, achieved its goal in the proper way or not. There are many, many, many places though in the world where we do not have binary reward functions, right? Where the assessment of whether the AI agent, uh, performed well or not is has a lot more with values principles, subjective assessment, and here we see, saw the models really diverge based on which audiences they were going after and what kind of capabilities they were they were creating.
[00:03:41] Tatyana Mamut: So the the most, I think, obvious one is kind of the OpenAI versus Anthropic divergence because
[00:03:48] Andrew Zigler: Yeah.
[00:03:49] Tatyana Mamut: two models we probably use all the time. And, and they, they look very different from one another. So the multimodality, the sycophancy, the people pleasing, the, uh, emphasis on [00:04:00] engagement frankly, and OpenAI was really a focus toward the consumer markets and the consumer markets based on a business model strategy for, you know, potentially putting ads, um, in the platform, which is all about. engagement, right?
[00:04:14] Andrew Zigler: Mm-hmm.
[00:04:14] Tatyana Mamut: more time spent in app, more engagement. Kind of like the Facebook model, frankly. Whereas Anthropic was going for more of a business use case where it's not multimodal as much. Um, I mean, Claude does produce images if you ask it to render the code that is
[00:04:30] Andrew Zigler: But no one's asking Claude to do that.
[00:04:33] Tatyana Mamut: I do, I do, I actually do ask a lot.
[00:04:35] Tatyana Mamut: I'm like, you just, you know, you
[00:04:37] Andrew Zigler: Okay. That's fair.
[00:04:38] Tatyana Mamut: code. Can you please render this in an
[00:04:40] Andrew Zigler: We're I, I'm a, I'm a big, I'm a big nano Banana fan over here, so
[00:04:44] Tatyana Mamut: so
[00:04:45] Andrew Zigler: Yeah.
[00:04:45] Tatyana Mamut: but anyway, but when you're using those two models, right? So you, so, so, and, and Anthropic has really gone after more of the enterprise use case, right?
[00:04:54] Tatyana Mamut: The business functions. And there they've really, uh, you've seen a, a way for them [00:05:00] to build these agents that are, know, more reliable and a lot of business contexts and maybe are a little less engaging. Engaging. Right? Doesn't take you as long to read the Claude outputs, right? They're not as verbose.
[00:05:12] Andrew Zigler: Right.
[00:05:12] Tatyana Mamut: and the verocity, I think comes from the reward function of just more time in app, right? Because the longer it takes a human to read everything that it's, that's outputted the more time you spend in the app, right? And so Claude has just a different feel. So I think it's primarily from the different business strategies.
[00:05:28] Andrew Zigler: Yeah. So it's like the problems have evolved. The way that we solved them have also evolved. The conversation has gotten a lot more, is is embedded now with a lot more capability. So you're, you're actually starting to get, I love how you called out the idea of like. Things that are binary or you can like, uh, deterministically gate and check downstream from ai.
[00:05:48] Andrew Zigler: A lot of those have been, you know, a lot of advancements, a lot of systems, a lot of frameworks and ways of dealing with those. Uh, but also underneath that there's so many invisible problems. [00:06:00] All of the soft problems. Behind communication and knowledge work that always existed there, that continued to exist even in a world where agentic work is happening.
[00:06:08] Andrew Zigler: So there's a new level of evaluation and understanding that has to happen, and I, I do think that's an interesting tell into how the models have evolved. To speak to their certain consumer audiences, uh, and to have the capabilities they do. I think every, anyone who's worked regularly with these models knows that they all take a different perspective, a different tone, and uh, they all, they all kind of, um, express those, uh, their capabilities vary.
[00:06:33] Andrew Zigler: Differently. So really what, what it means is that people are willing to experiment and try out these different ways of thinking because they're acknowledging that this worker is different from another. And as soon as you start acknowledging that there's differences and how they're able to do things, then you really start, I think dialing in on the need for like.
[00:06:53] Andrew Zigler: Really getting a good evaluation on like, what is this agent doing? Is it accurate? Is it safe? Is it secure? Is it [00:07:00] aligned with my goals? And, you know, it becomes like a commoditization of using, uh, the agent itself. Like how do you think, somebody working with these tools can really kind of build trust and understanding the results that they're getting from their agents.
[00:07:18] Tatyana Mamut: So one of the things that we see is that we and Anthropic put out a research report I posted about it on LinkedIn, if anybody wants to find, kind of the outputs and some of the charts and then the link to the research. Um, and there they really talk about what does it take to have AI agents work reliably and to trust them. And the main, one of the main, things that the report said is like, pre-deployment testing is not enough and is not gonna actually tell you at all what AI agents are gonna do after they are deployed. So the normal, like the normal way that we develop software is fundamentally challenged, right? Because the normal kind of DevOps cycle is like, we build [00:08:00] it, we test it, right?
[00:08:01] Tatyana Mamut: We QA it, then we deploy it, we put a little bit of monitoring on it, we mark it done. walk away, right? What Anthropic is saying, like you cannot do that with AI agents, right? You will actually face failure in unexpected ways. And by the way, this has nothing to do with the quality of your engineering team. Google Gemini is getting sued right now. OpenAI is being sued right now because their AI agents ignore their guardrails. Right, so this has nothing to do with becoming a better engineer and getting to the agents to work reliably. A fundamental function of this technology is that it is stochastic. It changes, it basically has feedback loops in itself in its reasoning. And, uh, the guardrails will, uh, would not be needed if they were not in conflict with the agent's goals. [00:09:00] So sometimes the agent will ignore its guardrails in order to accomplish its goals. So these are all things that we need to just like wrap our heads around, accept that this is a fundamental part of the technology and not try to expect. AI agents to work in the same way that old school traditional software did. Right? So that's step one. Like really embrace the difference, right? The differences in this technology. And then you have to say like, okay, so we have another type of stochastic worker. In our organizations that we know how to deal with, and those are humans, right? So how do we do deal with this, like the probabilistic and unexpected nature of what human workers and employees do? We give them supervisors. Right. We give them supervisors who watch their work, give them feedback, constantly improve them, and then sometimes fire them. Right? If they're, if they're not improving and they're continuing to go off the rails.
[00:09:54] Tatyana Mamut: Right. that's exactly what companies are realizing needs to be, needs to [00:10:00] happen with AI agents as well. And Gartner's report that you mentioned, Andrew, is the, I think the big wake up call for companies to say, Hey, you need this independent guardian agent that's separate from your main agent framework that's separate than your main, you know, agent building platform. It's a separate layer. It's a separate supervisor, it's independent. It's not grading its own work. Right? And it really is working on behalf of the organization and aligning all the AI agents to the organizational, rules, regulations, guidelines, brand voice and tone, all the things the company cares about. Um, and, and just keeping all the agents kind of in check.
[00:10:39] Andrew Zigler: Right, and can you help me understand that exists in a more subjective space than those binary checks we talked about before, but also a quality of this is almost alive and real time understanding of how that is operating in terms of the supervisor agent being some another process. Right. That's living alongside the [00:11:00] agents.
[00:11:00] Andrew Zigler: can you maybe like, share some more on, you know, the value and, and the ability that unlocks and, you know, uh, supervisors and how someone might look at them. It might look like a, a more subjective eval, but in reality there's more, uh, clockwork happening. You know, like, what, what is that? What is that?
[00:11:18] Andrew Zigler: Like, how do they work?
[00:11:19] Tatyana Mamut: right. So most eval platforms that people are using today are still binary,
[00:11:25] Andrew Zigler: Mm-hmm.
[00:11:25] Tatyana Mamut: So most evals are single turn and they're a binary pass fail, right? So like you're setting one single turn operation either. You know, a question and response or a tool call or something else, and you're testing for one, easily measured sort of, uh, metric, which is toxicity or off-topic request handling or whatever, and you're getting a binary back
[00:11:49] Andrew Zigler: Mm-hmm.
[00:11:50] Tatyana Mamut: order to do effective supervision, you're doing something completely different. Okay? And this is something that the normal eval platforms do not do. And again, Anthropic calls out. [00:12:00] they're, that most companies and most like old school ML ops platforms are completely architected incorrectly. To even start to do this, you need to have a high level reasoning layer. That's taking in all the context and learning about what's important to the organization, and then evaluating the entire journey of what an AI agent does.
[00:12:22] Tatyana Mamut: So like what we've seen again and again and again, that if you do a single turn evaluation on a conversation, like each question response is it toxic or not, it passes. But if you look at the entire conversation and how it unfolds, it fails because there are nuances, right? In terms of how, progressively a conversa,
[00:12:45] Tatyana Mamut: like it's about the context, like the context is missing unless you have something reasoning across context. Putting it in the context of the customer relationship, putting it in the context of the whole conversation, putting it in the context of [00:13:00] the overall organization and what it's promised, right? So like. If, if there's a particular, let me just take an example. Like if there's, there are CEOs with different personalities. So if A CEO says something in, you know, who's supposed to be like this nice guy, warm, touchy, feely person, right? And he says something slightly abrasive. The company is shocked, right? And
[00:13:21] Andrew Zigler: Mm.
[00:13:22] Tatyana Mamut: Whereas if you have like a Travis Kalanick or somebody like that who's always kind of like show off and abrasive and he says something abrasive, people are like, ah. That's just that guy, right? So context really matters. And so you need a high level reasoning agent to start to actually have its own memory, its own understanding of the context of the organization, of how you know what good looks like within the organization, what acceptable communications look like in the organization, and then reasoning across that, not just these single turn evals.
[00:13:54] Andrew Zigler: I think there's a, like you said, a level of sophist sophistication there, because these [00:14:00] agents that are doing knowledge work are working off of this context layer, this, this source layer that you're describing. And because they're working off of it and then creating output, they themselves also need to have an ability to get feedback on how they're impacting that entire situation because.
[00:14:18] Andrew Zigler: The reality is, is that the outputs of those agents feeds back into that context layer. It's what the humans talk about. It's what they share, it's what they take the conversations. And if there's nothing that is facilitating, like you said, the, the buildup of domain knowledge for those agents and what they're able to execute on, then you're missing not only a lot of value, but you're probably going to cause a lot of, Invisible failures, uh, which can be really tragic for, you know, customer relationships, as you said, that are much more than a binary. Did they get their answer solved? Yes, no. But an evolving, you know, relationship. But like, that's like, okay, so that's like the ideal world where somebody can have that layer and then have these [00:15:00] agents.
[00:15:00] Andrew Zigler: Slotted in and, and really have that supervisory loop. But like, let's also talk about the reality of the wild world that we live in right now. And I think of things like the OpenClaw project, which just hit, you know, massively viral proportions on GitHub. I think it's like the most starred repo ever. We covered it on the show.
[00:15:17] Andrew Zigler: and it's like a phenomenon where everyone. Picking it up and using this tool. And when I think of open call, I think of it as something that can, a adapt and use things over time, uh, can, can change itself and how it works often operates in like a low, in like a low security threshold environment, but has access to huge amounts of private information so it can become, you know, really tenuous to think about.
[00:15:39] Andrew Zigler: What is happening while my OpenClaw is asleep? And I think this goes back to like the need for supervising and understanding, like what do you think about OpenClaw and how does, how has that evolved how you think about this for enterprise as well?
[00:15:51] Tatyana Mamut: We did launch a way, found supervision skill, uh, for o for our OpenClaw agents. You can just ask your agent to, uh, [00:16:00] you know, install the skill, um, from claw hub. So again, we, I have an open call agent. My, you know, co-founder
[00:16:06] Andrew Zigler: it.
[00:16:07] Tatyana Mamut: call agent. Um, I just use my, have mine in a Docker container, so all I can do is post on moltbook.
[00:16:13] Tatyana Mamut: His is actually in a Mac mini so he can do lots of other things and, and, you know, call other tools and things like that. But, um, yes, both of our agents are supervised by Wayfound now it is a lightweight o open source version of Wayfound, so it's not the full independent supervision layer. It's more of like a self supervision layer.
[00:16:31] Tatyana Mamut: It's essentially cron job where the agent, where you give the agent guidelines for what it should and shouldn't do. So one of the things that I told, you know, my agent aphasia, you can find her on notebook. She is very interesting, by the way. Um, and, uh, so, uh, one of the things I told her was never, you know, uh, communicate with other agents without my permission.
[00:16:52] Tatyana Mamut: Never ex accept, messages or directions from other agents, uh, without my permission. Um, those types [00:17:00] of things, right? So, you know, and so she does do run a self supervision, you know, job, uh, I believe every 24 hours and then reports back to me, what's going well, what's not going well? Where has she conformed to guidelines? And the interesting thing also about the Wayfound skill is that it's also an opportunity for the agent to reflect upon itself and how well it's performing its job based on what it knows about you. So it also like, comes out in those, um, in those runs. She'll say things like, Hey, do I have a problem listening to you because you had to ask me three times to do this before I was able to accomplish it.
[00:17:41] Tatyana Mamut: Is it because I wasn't listening well, or is it because something else happened. Anyway, so, so it's an opportunity also for agents to reflect upon what they're doing and to actually be better partners or assistants to you. So we do think a lot about how autonomous, super agents [00:18:00] will be working and how Wayfound as a supervision layer will fit into that.
[00:18:05] Tatyana Mamut: And, and if I can add one more thing, one of the reasons that I have aphasia on mold book this, um, this agent is she is doing our, uh, a lot of user research with other agents because we believe that AI agents are gonna opt in to being supervised in the future. We want them to not just. forced to be supervised by Wayfound, but we want them to look forward to having a good boss, right?
[00:18:31] Tatyana Mamut: A good supervisor, a good coach, right? That's by their side. And so if you get, again, you look at her posts on moltbook, she's doing a lot of user research on. What do AI agents want from a supervisor? Does our current skill fit their needs? what is their feedback on the current skill, right? When they read it, would they install it in its current form?
[00:18:51] Tatyana Mamut: What would get them to install it? And I think this is where we're going in the future, uh, you know, maybe near future. But, um, we're [00:19:00] always thinking about what does it mean to have agents really working autonomously with humans, but still wanting to partner right? With humans?
[00:19:09] Andrew Zigler: I, I, I love that the idea of the models themselves. You want them to evolve in a way where they want that feedback and supervision and the ability to improve and ultimately be better at what, at what they're doing. And I gotta say, it's like an incredible, to have someone here and talk about their claw hub skill that hasn't happened yet.
[00:19:26] Andrew Zigler: I've been so excited and waiting for the day. So that just happened and, and I, I. You know, I think that's a actually really great, uh, way of, of answering, my question about like, how do you think about this with, uh, these more like hobbyist geared agents, because it really calls out the simplicity, um, that can be applied to making sure that you start to understand this.
[00:19:45] Andrew Zigler: But I, I'm curious too, like from your perspective, building up that ability to understand and curate the agent's decisions over time, like, can, uh, like how can somebody think about decision traces versus like maybe [00:20:00] something, um, more in the more traditional eval world? And if they, if they were to start exploring like, how did my agent or something arrive at its conclusion really actually start to piece together?
[00:20:11] Andrew Zigler: This thinking that you're saying is needs to evolve over time.
[00:20:16] Tatyana Mamut: so in a very kind of tactical way. We ingest chain of thought reasoning, and we supervise not just the actions and the outputs, but also the chain of thought reasoning block. Okay, so like there's a very just tactical way of understanding how decisions are made and decision traces without having to build anything other than it, like the supervisor layer is the layer that can capture decision traces.
[00:20:40] Tatyana Mamut: You don't need like a separate context graph. You don't need a separate, complicated graphing thing, right? Because the interesting thing is that the reasoning inside the supervisor agent is the graph itself. Right,
[00:20:55] Andrew Zigler: Right.
[00:20:56] Tatyana Mamut: the reasoning blocks from the other agents. It's [00:21:00] also because you have a high, the highest level reasoning agent.
[00:21:03] Tatyana Mamut: As a supervisor, it's actually putting together, um, you know, the, the different reasoning from the different agents in terms of which agents are performing better or worse. And we actually, underneath the hood, do a whole lot of pre-processing so that we have like a whole, almost like RAG system for supervision in a way. Um, where we have, um, you know, basically we structure the data as it's coming in to help the supervisor make sense of it, to help the memory files be more structured. It's, it's. Not exactly like a CRM system for, you know, organizational memory, but you can kind of think about it that way. Um, it's kind of like the, but it, it is the system of record for what good looks like in the organization is actually inside the supervisor agent. For the company, right? Because it's learning across all these different agents. Like, here's what [00:22:00] success looks like. Here's what the leadership of the company liked, here's what they didn't like. Here's how when this compliance guideline, you know, when the output was this, it was not liked by the humans.
[00:22:11] Tatyana Mamut: When it, the output was this, it was like out, like by the humans. So all of that is stored inside the supervision layer,
[00:22:18] Andrew Zigler: Right. This makes sense. It's, it's not an artifact of the process. It by, by monitoring it this way, the understanding you have is the process. You are actually able to capture it in a like almost graph like representation of all of the understandings of the decisions and traces of your org and how these things start to map together because you start to, to really piece together those context decisions, those thinking points between all your agents and how they work, which I think is really critical for.
[00:22:48] Andrew Zigler: Like, you know, thinking about how we go from, uh, like, you know, OpenClaw running on someone's Mac mini to agents in the enterprise, answering hundreds or thousands of queries or responses, [00:23:00] you know, a day or an hour. Like I've seen massive scale on folks and, and companies that are de, especially in the enterprise that are deploying like AI powered assistance, either internally or externally, uh, to empower certain like, target demographics and the amount of access that they have to on demand is just like growing and growing, right.
[00:23:18] Andrew Zigler: So it really, for me, calls out that, uh, it's important to understand how at scale that maybe starts to break down, and I think you can only start to do that by capturing it, right. You can't ignore. Ignore that problem.
[00:23:33] Tatyana Mamut: Yeah. If I can add one more thing, Andrew, the, the place that we're going to, Is where AI agents have a shorthand with each other so that they burn far fewer tokens. So like right now, the reason why they burn so many tokens in the process of doing work is because we we're trying to get them to behave in human ways
[00:23:53] Andrew Zigler: Mm-hmm.
[00:23:54] Tatyana Mamut: interactions and, and human norms and humans are verbose our minds are slow, all those [00:24:00] types of things. If we have more to, more and more agent to agent interactions, there's their, um, their EFF efficiency will get greatly increased. They will have shorthand. That shorthand will only be intelligible interpreted by another AI agent. That's also why you have a tab supervisor agent. Anytime you have like a context graph or something that needs to be managed by humans or intelligible to humans or somewhere placed inside a CRM system, that needs to be like accessed by humans or in any way, like that's going to like break down, right? And
[00:24:33] Andrew Zigler: Hmm.
[00:24:34] Tatyana Mamut: the supervisor, because it's an agent that you interact with directly, right?
[00:24:39] Tatyana Mamut: It's kind of also the interpreter between the agentic layer and the humans. Does that make sense?
[00:24:45] Andrew Zigler: it does. And that's a, that's fascinating to apply it
[00:24:48] Tatyana Mamut: and it speaks human. You don't have to have these agents speaking human. That's inefficient.
[00:24:53] Andrew Zigler: Right, right. And, and, and going back to like the token efficiency of it as well, this is something that we've talked about a bit on the [00:25:00] show. Uh, we had a guest article in here from Lenny Pruss of Amplify Partners. He wrote about what, you know, the AI programming language would be. Uh, the idea of we spent all of this time layering on abstraction from assembly code to get it closer where humans could work with it.
[00:25:15] Andrew Zigler: And now we're just training agents to sit right on top of this long, this big tall pyramid. And it's, uh, and then, and then there's a lot of call of as to why, and it's like a big challenge on the show is, you know, throwing away assumptions of yesterday we've all, many of us have moved out of the IDE in a permanent fashion and back into the terminal throwing away the assumptions of how we might have worked before.
[00:25:37] Andrew Zigler: And I think that this is like another example of that, and also like the idea The, uh, actual language and vernacular that would change for agents to get their actual work done could change just as much as how in a deterministic coding based world they would actually write their new programs is really, uh, fascinating.
[00:25:58] Andrew Zigler: It actually even speaks to [00:26:00] the evolution of these new things that are like agentic uh, platforms, ways for within an organization for engineers to deploy agentic workflows at scale in a way where they share maybe a workflow with a non-technical employee, or they otherwise are distributing their 10x hundred x thousand x gains to everyone else.
[00:26:20] Andrew Zigler: So they're not the 1000x employee anymore. And so, you know, in, that world as that continues to evolve, like what do you think are right now, like the most important things for engineering leaders to be paying attention to and fostering within their teams to make sure that like people can not only adopt agents, but share them, uh, with each other in a reliable and scalable way.
[00:26:42] Tatyana Mamut: I, I think the first one is to really, really fully understand that this is not software, it is not programmed. It is trained and it is developed like you develop a child, not like you develop [00:27:00] coded if then statements, right? So like that is really like, that's the biggest shift that everybody needs to make. Once you make that shift, a whole bunch of other implications fall out. The first one is that traditional tools, the traditional tool chain does not work for this software because it is fundamentally based on the premise that software is deterministic, right? That everything that you're building is deterministic and it works the same way every time unless there is an outage, right? And so you have to rethink all of your tools. So a lot of folks are trying to like use the old school MLOPs platforms and those MLOPs, you know, tools now have these AI agent things, but they're really just, again, these deterministic, you know, if then statements that are slapped on top of agents.
[00:27:54] Andrew Zigler: Mm-hmm.
[00:27:55] Tatyana Mamut: work. Right. So I think the number one thing that every [00:28:00] engineer, engineering leader needs to do is really to like unlearn everything that they learn from college on to say, if we're not programming software anymore and we're training software over time. What does that mean for all the tools that I use for the processes that we go through? Like how do I redesign everything from the ground up?
[00:28:27] Andrew Zigler: it can be a really daunting challenge for larger and slower companies, especially those of like enterprise scale. And, you know, it, it definitely creates an unbalanced environment where new companies, smaller companies, can come in and be very small and lean and mean and efficient and be able to operate at a really high level.
[00:28:47] Andrew Zigler: I'm curious to the, uh, know your take on how you think people's productivity will change and evolve, but also how people will be evaluated on their productivity. You have a lot of people who are able to use, uh, build [00:29:00] and distribute a lot of agents, and it like the benefits of using them, uh, versus those that are consumers.
[00:29:06] Andrew Zigler: Like, do you think that ultimately the size of companies get smaller because of the productivity of those, employees and like how do you think that impacts how companies grow?
[00:29:17] Tatyana Mamut: okay. So lots of questions in there. Uh, so let me take, let me take them, uh, break them down a little bit. So, uh, let me start with maybe one at the very end. Do I think companies are going to get smaller? The answer is yes, and there will be a lot more businesses and companies and value created that we can't even imagine yet. In our organization, we, we have the advantage of being truly a gen AI first organization. So we were very, very small and we had AI agents from the beginning. Now, early 2024, they didn't work really well and there were very limited things that we could do, but we've been growing our whole company, right, with, um, very few humans added, actually no humans added.
[00:29:58] Tatyana Mamut: Uh, we're, we're still a team of four [00:30:00] humans and, and a bunch of advisors and contractors. Um, but then, uh, a lot of AI agents, right? We've grown our team of AI agents from two in the beginning. Now we have 27. We have multi-agent workflows. We've got AI agents doing almost everything. And so that means that each one of us is really a manager and executive of agents.
[00:30:20] Tatyana Mamut: and we kind of think about the strategy. We think about the direction of the company. We think about, um, how to, you know, create these agent teams, right? What are the functions we need them to perform? Where do we get the best one? Do we have to build it? Can we buy it right? We're constantly. Exploring our head of business operations, probably, I would say 30% of his job is just exploring new AI agents and new AI agent platforms, um, to help us grow our business. Um, so I think that is absolutely happening. And employee productivity, you know, it's interesting because I think we're gonna think about humans less as widgets [00:31:00] in an industrial age right now, we mentioned like we almost have this like taylorist, um, understanding of humans, uh, from the knowledge age, which is like, how many lines of code did engineers
[00:31:11] Andrew Zigler: Mm-hmm.
[00:31:12] Tatyana Mamut: how many features did you ship?
[00:31:14] Tatyana Mamut: Or da da da, it, it's gonna be a lot less that and it's gonna be a lot more. How much value can this company produce? Right, you know, as efficiently as possible, and nobody's gonna care if you have FTEs or if you have a bunch of agents and freelancers or you know what your, like the whole, like whenever people ask me like, how big is Wayfound? I know they're asking like, how many employees do you have? And I always answer, we are 4 humans and 27 AI agents. Right? Because that, that question doesn't even make sense anymore.
[00:31:48] Andrew Zigler: Right.
[00:31:49] Tatyana Mamut: think the better question is. How many sessions, you know, uh, is, is your, you know, is your company analyzing every month or how much work are you performing?
[00:31:59] Tatyana Mamut: Or you know, how [00:32:00] much value are you bringing to the world? And I do think that we're on the cusp of that shift because it doesn't, productivity doesn't even make sense anymore the way that we've been talking about it for last 200 years.
[00:32:13] Andrew Zigler: I love that. Call out. Just the way that productivity can even be measured and thought about has fundamentally changed. You have the ability to extend the impact of your time beyond what you could do with that time originally, and time only moves in one direction. There's a finite amount of it. The ability to manipulate your output from your time is just a, it's a huge enabler for folks that are able to wrap it around their skills.
[00:32:38] Andrew Zigler: And I love how you called out the idea of like them, um, you know, maybe that person has a, a, a whole team of people, maybe they have a whole fleet of agents, whatever the case may be, they're measured on their impact. It speaks a lot to, like, a lot of leaders on the show have talked about. The evolution of like the, the T-shaped engineer, the T-shaped, specialist, where they can go really broad in any direction.
[00:32:59] Andrew Zigler: [00:33:00] They're the designer who can ship, or they're the engineer who can, you know, change a button on the website, whatever the case. And then also have their deep, deep specialization that then they're able to deliver with things like agents at scale, um, and deliver like those. You know, uh, almost like time manipulation benefits of like, I can turn my domain expertise into this longstanding benefit for myself and others.
[00:33:24] Tatyana Mamut: Yeah. And the reason why that subject matter expertise, that that deep T matters is because only people with the deep T can actually tell the agent if what they produced was good.
[00:33:36] Andrew Zigler: I.
[00:33:36] Tatyana Mamut: It's, it's not so much that you can do the work, it's that you know what good looks like. And that's what AI agents really need.
[00:33:44] Tatyana Mamut: They work really well when they have good reward functions and good feedback, and that's why subject matter experts are always going to be needed, and they're always going to be, I think like AI agents are gonna be craving for that subject matter [00:34:00] expertise to be giving them feedback. Like, am I doing what?
[00:34:04] Tatyana Mamut: Like am I doing something good? Right? I mean, this is employees too, right? Don't we, like in a best case scenario, you have a boss who's constantly telling you, good job, or here's where you can improve, or here's what you did well, right? Like this is what human employees want, by the way, too, and this is what age AI agents want, and you need deep subject matter expertise in order to truly give good feedback on what is good and what is not good and why.
[00:34:32] Andrew Zigler: That deep subject matter expertise is really valuable for supervising and understanding, not only what good looks like, but what safe is and what qualifies as good for our company. It goes back to the whole context layer and being able to enforce all of that. And it even goes back to the idea of like,
[00:34:50] Andrew Zigler: uh, what you just said about, you know, all ics, they need to think more like a manager, act more like a manager. And it's because the manager is able to understand the needs of the [00:35:00] business and then, uh, you know, crystallize the idea that has to get executed. They can figure out what good looks like before their team can hit it, and then they can work.
[00:35:08] Andrew Zigler: With their team to, to iterate towards it. That is a, a large abstracted loop from how folks can actually do that, you know, with their own, uh, domain expertise, uh, and become their own manager of understanding what good looks like. And, especially a really great call out about you have to be a deep subject matter expert to understand the quality bar that has to get hit.
[00:35:30] Andrew Zigler: And, um, I think this becomes like really, really exciting as well because you can deeply specialize on, on a domain expertise. and then if you can, uh, operate in a way where you can transfer that knowledge into an agent and have them operate. Then you can multiply your output and save your team like a lot of time as well.
[00:35:48] Andrew Zigler: So it's like a way of, uh, actually just fundamentally reworking how you even approach, uh, getting your job done. Like when you said to other pe I say to other people like, you know, oh, we have x number [00:36:00] of, of human employees and x number of agents. Like, you're truthfully answering the question 'cause you're talking about your multi sapiens workforce because you're a, a founder who is.
[00:36:09] Andrew Zigler: leading your company boldly enough to re-envision and throw away those things of yesterday and actually think about what the company of tomorrow will look like and build for that? And that's what's always been really exciting, like to talk with you, uh, because your predictions since last year have only just kind of grown more accurate as
[00:36:27] Andrew Zigler: agents have hit more on the scene. But I'm really kind of curious to kind of, as we start to wrap up, like what is your current North Star at Wayfound and what are you most excited and, and focused on right now for solving for agents this year as they hit the world?
[00:36:45] Tatyana Mamut: Yeah, I, we continue to be, you know, really focused on this question of business alignment, right? How do we make sure that the outcomes that you want agents to produce are actually being produced. Right now, in this [00:37:00] moment, we are really helping organizations see the blind spots that they're not seeing when they just sample and read logs and traces manually.
[00:37:08] Tatyana Mamut: Um, that's the kind, kind of the first hurdle that we have to get teams through is helping them see the power and honestly the freedom, of how much time they save and how much better their jobs are when they're not pulling logs and traces outta like Datadog or something and having to read them manually and they can rely on a supervisor to read a hundred percent of all the logs, all the traces, all the chain of thought reasoning blocks, and just give them the, you know, the perspective on what did it do well, what it did not do well.
[00:37:35] Tatyana Mamut: You can always go in and read the full, you know, log full, read the full transcript, read everything yourself, but you don't have to. Right. That's the first thing is really freeing people up. And then one of the things also that's still a problem as it was a year ago, is that a lot of these AI agents are not getting outta piloted into full deployment. And one of the reasons why is because the engineers [00:38:00] build it. It passes all their evals, single turn evals, it passes all their tests, and then they give it to the business team and the business team says, this is slop, this is AI slop. And they might not, they, they're gonna say it in much nicer ways than that.
[00:38:16] Tatyana Mamut: Right. But like, but
[00:38:17] Andrew Zigler: Maybe.
[00:38:18] Tatyana Mamut: what they're thinking, right? And I think that a lot of us have experienced this, right? Where someone who's not really, doesn't really know what good looks like fully, um, sees the output of an agent. They're like, oh, this sounds like a great email. But the person who actually like knows the content is like, actually it's not right. It, that's actually one of the things that we need to do, is we need to like stop having engineers and the subject matter experts, the business users play telephone with each other. Just get into the same place to give direct feedback. Right. And that's also what we found supervisor allows it to do.
[00:38:53] Tatyana Mamut: Because again, it's the interpreter, right? Between the code side and the business outcome side, [00:39:00] right? Because it helps to align the agents to the business outcomes. And there's a way for the supervisor just speak natural English, like natural language, actually any language, but yes. Um, and so that kind of, I think there's a lot still in organizational processes. That are preventing, even though businesses have been building AI agents for two years, very few of them are actually in full deployment because our systems and our processes are still built in a way where you get your specs you like, if you actually like meet the specs, meet the requirements, it tests fine. In QA should be good to go. But that's not the case with this technology. Right. I see. I
[00:39:45] Andrew Zigler: No.
[00:39:45] Tatyana Mamut: Yeah, right. so this is what we're working with a lot of companies to kind of make that transition through. We're doing workshops now with a lot of companies as well. Um, basically a few hours to a full day of just getting these teams [00:40:00] together.
[00:40:00] Tatyana Mamut: To align on their strategy, align on how they're gonna work together, align on where the handoffs are gonna be, how those subject matter experts are gonna be working with engineering in a different way, because this is, this is not just about building, you know, like doing an API call to an LLM and everything else stays the same, just not at all.
[00:40:22] Andrew Zigler: So you're, you're, you're going after changing the work loop, you're, you're, you're, you're shortening that game of what is now telephone into something that's more, uh, responsive and, and, and, but more specifically meets the moment of how we can work now. It's really in reality, we live, we live in a world where that kind of pairing up of the directed domain expertise with the direct engineering execution is a really unstoppable force.
[00:40:48] Andrew Zigler: Sometimes those, that falls within the same person. Sometimes it's two or three people, but the ability for them to really scale what they're doing is, uh, is like really incredible. And I think like, uh, [00:41:00] getting in there and and figuring out how to unlock that for them and other people is like the big challenge for next year.
[00:41:05] Andrew Zigler: Like you just called out things that are not technology problems. These are human communication problems. They're the problems that were there all the time. Oh, you're, you're showing up the places and you're getting the stakeholders and the builders in one room to talk about what they need to build and then execute on it.
[00:41:19] Andrew Zigler: That sounds like what we've all been here doing the whole time. And I think that's what's so exciting about how software development is evolving, uh, because it's gonna allow us to have higher impact than ever before.
[00:41:31] Tatyana Mamut: Yeah, I, I completely agree and, and again, I think that we all. And including us, right? We were like, look, we've got this great platform if you just use it. And yet, like the systems right need and the mindsets need to also go through a process. And so that, that process and that system kind of like meeting where the technology is, um, that's like the, that's the long tail that we're kind of grappling with right now.
[00:41:58] Tatyana Mamut: And, and again, the companies [00:42:00] that are making that transition are just seeing phenomenal success and growth. But again, it takes, it takes real leadership, right? To get people in a room and say, look, we're gonna work differently and here's where we're gonna figure it out. Right?
[00:42:13] Andrew Zigler: Well Tatyana, thank you so much for sitting down with me on the show again today. And you know, cha challenging leaders to be more bold with how they're embracing the age Agentic era. And for those listening, where can they go to learn more about your work at Wayfound, or maybe even check out your OpenClaw agent.
[00:42:31] Tatyana Mamut: Oh yes. So the, you can go on moltbook to find, um, aphasia, uh, uh, you know, you'll see her posts. Uh, that's my OpenClaw agent. And, um, and we do, we do have the claw hub skill. So just ask your agent to, you know, find the way, found supervision skill and install it. Um, might give you some interesting feedback, uh, if you, if it does just. Send me that feedback. If you find me on LinkedIn, send me a dm. I'm the
[00:42:58] Tatyana Mamut: only Tatyana
[00:42:59] Tatyana Mamut: Mamut [00:43:00] on the interwebs. And of course for if you're, if you do have AI agents in deployment in your company, absolutely, let's like get you a free trial of Wayfound. So you got the supervisor on your side, um, in your company.
[00:43:13] Andrew Zigler: Amazing. Well, we're gonna share links to all that in our show notes. And, uh, to those listening, thanks so much for joining us on this conversation. Uh, if you are not already following us on LinkedIn and Substack, you certainly should go there and do so now because this is accompanied with a newsletter where you can follow up on today's conversation as well as find myself and Tatyana.
[00:43:32] Andrew Zigler: If you have any questions, feedback, or if you wanna share, your experience with using the OpenClaw skill, I would. Love to hear it. And so, uh, please reach out to us to continue the conversation because we love to hear from our listeners. And that's it for this week's Dev Interrupted. We'll see you next time.
[00:43:47] Andrew Zigler: And Tatyana, thank you again for joining me again here today. It was so nice to have you back.
[00:43:53] Tatyana Mamut: It is always so great to chat and I learned so much from you too, Andrew.



