Podcast
/
Teach the primitives or watch your competitor define them | Baseten’s Philip Kiely

Teach the primitives or watch your competitor define them | Baseten’s Philip Kiely

By Philip Kiely
|
Blog_Comprehensive_DORA_Guide_2400x1256_46_ecd7595606

If you aren't the one educating your users on the fundamentals of AI, your competitors will happily do it for you. This week on Dev Interrupted, Andrew sits down with Philip Kiely, Head of AI Education at Baseten and author of Inference Engineering, to discuss why the secret to winning the AI market is owning the educational narrative through active market development. They explore the rise of the "Double-T" shaped engineer, the hidden complexities of scaling the inference stack, and why the most successful AI companies treat developer education as a mission-critical go-to-market motion.

Show Notes

Transcript 

(Disclaimer: may contain unintentionally confusing, inaccurate and/or amusing transcription errors)

[00:00:00] Andrew Zigler: Today, I'm joined by Philip Kiely. And Philip is the head of AI education at Baseten and the author of the new book, inference Engineering, which we're gonna talk about some today. And Philip and I, you know, we share a background in, in developer advocacy and around education and, um, developer advocacy itself has always been about empowering developers and we've talked a lot on our show about how, Right now, this is something that we all need to rise to the occasion for, uh, in terms of teaching ourselves, but also teaching others about the new norms of engineering and the way that we all get our work done. and especially now today, we're certainly starting to see a lot of folks in titles that are more AI native and focused on the AI front. That's unfolding in front of all of us. And so we're gonna talk about what it means to be in an AI education role, what that means to uplevel, your game as a developer advocate, and enter this new world of agentic engineering and the engineers that are here building. And we're also gonna explore a little bit about the connections between all of this education, engineering, and product, and how it creates that sweet, sweet motion of [00:01:00] go to market that we're also obsessed with. Um, and it's frankly roles as well that, you know. Head of AI or AI education or AI platform, like these are brand new roles, but now they're in many cases becoming so essential.

[00:01:14] Andrew Zigler: So Philip, welcome to the show. We're really excited to dig in with you today.

[00:01:18] Philip Kiely: Hey, thank you so much for having me. I'm really excited to be here.

[00:01:21] Andrew Zigler: Amazing. Well, I'm gonna dive right into talking about your role in developer advocacy. Um, I'm curious, like when you talk about your role as head of AI education, how do you describe it? What is, um, what is it that, uh, is most important to you? Um.

[00:01:36] Philip Kiely: That's a great question. You know, I've always been very dismissive of titles. I've not been super into labels and definitions when it comes to my work, because my ultimate goal is just to be Philip at Baseten. Well. I have a good understanding of the things I like to do and the things that are valuable to the company, and I just do those.

[00:01:57] Philip Kiely: Uh, but of course you have to [00:02:00] communicate to the rest of the organization as well as to the rest of the world, like what you're working on in a sort of succinct way. Just like maybe the title of a book is the most efficient version of the thesis of the book. The title of a person should be the most efficient thesis of the value of their work.

[00:02:15] Philip Kiely: And what we've found is that there are a couple different types of go-to market motions right now in the AI industry. One is a very traditional sort of market capture where you're trying to go after existing workloads. You're going after people who know exactly what they want and trying to move them over to your platform.

[00:02:37] Philip Kiely: The other opportunity though, the, the potentially more exciting motion, I think is more of a terraforming market development type motion where you're going out and introducing people to new concepts that they're going to need to learn in the next months and years to effectively do their job. And you're making sure that from the first moment they experience these concepts, they're thinking about them in a way that's conducive to the way [00:03:00] you do business.

[00:03:01] Philip Kiely: So that's really my, um. Mission right now is to introduce a bunch of developers to AI concepts in a way that gets them thinking about the problems of inference and the problems of building reliable AI native tools in the same way that we do. 'cause if they think about the problems the same way we do, they're gonna naturally come to us as the solution.

[00:03:24] Andrew Zigler: Yeah, you're actually hitting on something that's really exciting. I think right now working, uh, in a, in a role where you're kind of educating and forming and empowering people about new stuff, is that you're just putting new ideas in their head that are gonna sit there for a while. They're gonna kind of like, you're, you're planting the seed of thought, right?

[00:03:41] Andrew Zigler: With the idea that over time, this. This demand, this knowledge and what people can leverage from it is only going to compound and grow. So the earlier you're there in their mind share and you're able to kind of set them on the right path, then the more they're gonna like attribute so much of that back to [00:04:00] you.

[00:04:00] Andrew Zigler: You're kind of part of like the nucleus of the thought, which is like, like what you hit on. Like what's something that's really exciting right now because it's like, you know, you're putting this new ideas of engineering in their head and then you're seeing how people are able to run with it.

[00:04:12] Philip Kiely: Yeah, and you can do that from a sort of content and and marketing perspective. You can also just do that on technologies. Like, hey, everybody is still using the OpenAI SDK, even when they're not using OpenAI models and they're still using the same OpenAI output format just because that was the original industry standard from a few years ago.

[00:04:33] Philip Kiely: Like that's developing mind show as an example. There's a lot of of advantages to being a sort of first mover when it comes to capturing that mind share. Of course, you've gotta do a lot of other things to turn that mind show into business value. Like first movers don't always win, but you get the mind show and that's a, that's a very valuable thing.

[00:04:53] Andrew Zigler: Exactly. So where, how do you think about the intersection of like that you know, a developer advocate [00:05:00] might do with more traditional like engineering or product world? especially because now there's so much new everywhere on the education front, but also in the product engineering world. How do you partner internally as well, um, as part of your role?

[00:05:14] Philip Kiely: What's cool is that I now have a lot of leverage to do engineering work because of coding agents. So I'm able to ship like more complicated demos than otherwise, I would have been able to build. I'm able to ship, you know, programmatic SEO on top of just ordinary sort of blog content. I'm able to, you know, write custom software to make the process of creating and publishing a book easier.

[00:05:41] Philip Kiely: There's, there's a lot of different things I can do on the engineering side. But at the same time, like I don't really think of my role as a developer advocate in terms of like creating Alpha and a lot more around distributing it, discovering it within the company and bringing it to the market. And with that, like that takes the, the close [00:06:00] partnership with engineering to,

[00:06:01] Philip Kiely: know who everyone is on the engineering team. That's, that's a, a big challenge actually now because we're going through so much hyper growth. Um, gotta know who everyone is, know what they're working on, know their communication style, what they're gonna be excited to talk about. Some people are very loud when they come up with a breakthrough.

[00:06:18] Philip Kiely: Some people just kind of ship it as a PR and, and don't talk about it. And so you have to be able to surface all of these different styles from across the organization. And say like, alright, of every awesome thing that got shipped this week, like what is going to make the biggest impact? The, the other piece is helping enable the engineers themselves to become content people.

[00:06:41] Philip Kiely: Uh, because a, a piece of, of content or an idea is always going to be best received coming as directly from the source as possible. So if that's ghost writing, if that's co speaking, if that's just like getting people a Twitter account and saying, go have fun. Um, there's, there's a lot that you can do [00:07:00] to turn your army of engineers into an army of content people, um, in a way that that helps your companies, uh, brand be be very authentic and, and very technically driven.

[00:07:10] Andrew Zigler: Yeah, absolutely. I mean, engineers are classically very allergic to marketing, and the more that you can, you can meet them on their level and also to empower engineers to tell their own story instead of telling it on their behalf. Then the more, uh, flow of information that we have around all of these kinds of new ideas, and it makes everything that we do higher leverage as well.

[00:07:30] Philip Kiely: Yeah. I found that like my engineering team is not allergic to marketing. In fact, like, I mean, I think of myself as an engineer. I'm an engineer by training and I love attention. It's all I need.

[00:07:42] Andrew Zigler: Relatable. And so I guess that's why I say

[00:07:44] Philip Kiely: yeah,

[00:07:45] Andrew Zigler: engineers are allergic to marketing. I do feel like that perce, that perception is changing.

[00:07:50] Philip Kiely: but the, the thing is like a lot of it comes down to just helping them understand the value of what they're doing on, on the go to market side [00:08:00] showing, you know, EV engineers love numbers, like I love seeing number go up. So just you can sort of show impact that way, but also showing in terms of, hey, we were able to bring in this customer because of your work, or we were able to bring in these candidates to join your team because they saw the interesting work you're doing and they wanted to come work with you.

[00:08:20] Philip Kiely: just being able to demonstrate an actual outcome. Uh, a really concrete and tangible one, I think just goes a really long way to getting just about any engineer hyped about the idea of, of creating content. I even, I work with people who don't even want to, you know, get up on stage, don't like attention at all, and once they understand like, okay, you know, if I do this, I'm going to get recruiting, I'm gonna get customers.

[00:08:45] Philip Kiely: Then, then they're willing to, to get up there and, and get a little spotlight on themselves.

[00:08:51] Andrew Zigler: Yeah, exactly. And I, I think as well, like you made a good point about how. they don't typically maybe get as [00:09:00] much exposure to the customer and their problems, and so once they get a taste of that, it becomes really exciting to be engineering and, and, and delivering things that, where it's like then you go meet with the customer or you're able to get closer to like how somebody used something that you shipped.

[00:09:15] Andrew Zigler: Because at the end of the day, we all became engineers because we wanted to build cool stuff that people used and it gave them value. And so I think the closer that we all, you know, get to the reality of that conversation, the more productive and the happier and more fulfilling our, our jobs even feel. Um, which is what you kind of called out there, you know,

[00:09:34] Philip Kiely: Yeah.

[00:09:34] Andrew Zigler: once in 10 years, kind of start to figure it out. and I, I like how you said it too, about distributing, like the distributing, like the alpha distributing the, the gains. You're, you're more like a bridge. I think that's classically, like really when, um, the developer advocate role has always traditionally kind of been like this marketing to Product Bridge or marketing to engineering bridge. And now it's a bridge between a lot of different places. And frankly, I, I actually think that's really, [00:10:00] uh, a fundamental trait of the new, um, like AI leadership kinds of roles that are popping up, um, at companies is that they're ultimately brought into be a bridge.

[00:10:09] Andrew Zigler: Because rolling out AI across the company is a very cross team, cross department initiative. You need somebody who can meet a lot of different problems at a lot of different levels and inspire people to come together to, to work on that. And so, like, you know, at at Dev Interrupted and at our parent company, LinearB we, talk a lot with, folks that are in a developer of AI enablement role.

[00:10:31] Andrew Zigler: That's somebody who's tasked with making AI a reality, specifically like for the engineering team, so they can use it and they can ship with it and, and deliver it safely, and the company can understand the impact of that. So, um, understanding the impact of. How we're, you know, adopting this new technology is really important for a lot of companies.

[00:10:51] Andrew Zigler: How do, how do you think about, um, uh, like measuring or understanding the enablement of the work that you do?

[00:10:59] Philip Kiely: [00:11:00] One sort of difference is that my work is much more externally facing. That said, I do a lot of internal enablement as well, especially within the go to market team. What I like to measure is almost like a language learning process, where the first step is trying to get people conversational, understanding the nouns and the verbs, and how all these concepts work together, and then eventually we want to get them fluent where we feel really good about putting a seller on the phone with a technical point of contact at a customer.

[00:11:35] Philip Kiely: And having them be able to go through a discovery process and go through a genuinely consultative cell without every time they hear a word like V LLM having to run back to the engineering team and, and ask what's going on. And then the, the ultimate goal is to sort of become AI native in whatever role they're doing.

[00:11:55] Philip Kiely: So like an AI native seller who really [00:12:00] understands the space and has a strong intuition for. What individual customers need. you know, an an AI native engineer who is able to seamlessly use these tools to accelerate the work. One thing that we've done, which is really exciting, is we have a, from, from our infrastructure team, a mechanism for deploying vibe coded sites in a sort of secure, centralized, and standardized way.

[00:12:27] Philip Kiely: You know, it's behind our single sign-in gate. It's limited to, to only Baseten people. And with this, you know, I've been able to ship cool sites, but also people from outside of engineering teams, from, from our people team, from our sales team, from our uh, operations team, have been able to create custom tools and then host them internally through this sort of centralized mechanism.

[00:12:54] Philip Kiely: this is the sort of, of tooling that I think is really effective. For these kinds of [00:13:00] enablement pieces, because if you have everyone just kind of doing their own thing, it can really spiral out of control very quickly. But you also don't want everyone to have to go through, uh, a big, you know, procurement cycle every time they want to just like ship a little web app.

[00:13:15] Philip Kiely: So having a, a strong, a, a, a strong set of tooling, um, technical tooling in place to be able to just like let people ship. Um, is I think, one of the biggest unlocks you can have for delivering AI across the organization.

[00:13:31] Andrew Zigler: Yeah, I couldn't agree with that more. We've, we've talked about that a lot on, on the show from leaders who are working on or building in spaces or otherwise have adopted these kind of, uh, closed ecosystems within, not just their engineering org. But their whole company where anybody can go to this one platform and make an app, or make an agent or deploy or share a workflow.

[00:13:52] Andrew Zigler: And it's one about centralizing and standardizing all of them in one place where they can be, you know, controlled together. But then it's also too about [00:14:00] sharing, uh, your domain expertise, getting other people involved in, in what you're working on, in building, finding unexpected like users and use cases of things that you're building. And by having like a centralized place where all of that comes together, uh, I completely agree, is really critical. Um, like we had James Everingham, the CEO of, uh, Guild ai, and he talked extensively about how he had built exactly that. Uh, when I think he was, he was at Meta and so when he was at Meta, he had, he had built a, a platform where folks could build and ship like internal apps and agents and tools.

[00:14:33] Andrew Zigler: And the adoption off of that was, you know, it was, it skyrocketed and it got the whole company to become AI enabled in a really structured way. I'm curious too, speaking about the whole company, how do you think about like, making it so that, um, this is, I I we talk a lot about the show, about like the broadening of skills and the rise of the generalist and you get these deep specializes and you get these broad generalists, like a designer who can ship, or an engineer who can go to those [00:15:00] customer meetings. Uh, I'm curious how you think about like the shape of skills, um, that, that engineers are developing and what's most important now.

[00:15:06] Philip Kiely: I think that what I'm seeing a lot is people with two deep skills. So if we've previously talked about like T-shaped engineers, maybe this is a upside down U or something.

[00:15:20] Andrew Zigler: Yeah,

[00:15:21] Philip Kiely: I need to, need to brush up on my alphabet shapes there.

[00:15:24] Andrew Zigler: a double T, double

[00:15:25] Philip Kiely: T. Yeah. Generally it's, you know, for me it's engineering and writing, engineering and, and sort of content creation in general speaking, all that kind of stuff.

[00:15:37] Philip Kiely: For a lot of our engineers, it is that sort of customer facing communication and, and accountability piece. One thing that we do is we have a forward deployed engineering team. This team reports up to our head of engineering. So at a lot of companies you have a sort of sales architect, sales engineer motion where these [00:16:00] roles are sort of within the go to market team.

[00:16:02] Philip Kiely: Um, and at the moment we don't have something like that. We have, you know, highly technical sellers who are able to go out and sort of speak to the customer and understand their problem. And then we have a team of forward deployed engineers who again, sit within the engineering or. Uh, in the slack channels with the customers are deploying into customers accounts, so doing sort of hands-on development, co-engineering with the customers, helping them like figure out what model's gonna be best and, you know, figure out which parameters are going to result in the fastest inference because of that forward deployed engineering function.

[00:16:39] Philip Kiely: And especially I think because that function exists within the engineering org. There is a culture throughout the entire engineering org of being customer facing, where it's very common for engineers from other teams like infrastructure and model performance and core product to go directly to the customer, [00:17:00] ask them questions, get feedback on the product, help them get unblocked on tricky issues, and then take that feedback and, and look it into the product.

[00:17:07] Philip Kiely: So I think that's the, the first sort of big second skill for, for a lot of engineers is being great at both the building piece and the sort of one-on-one customer interaction, which especially when you're working with highly technical customers, can definitely be a lot more comfortable than say, like speaking on stage or, or doing a podcast, writing a, a, a tweet thread.

[00:17:33] Philip Kiely: The other way to look at potentially a two parts of depth, two legs of depth would be to say, okay, so you need to be an expert in AI technologies in, in the models and the harnesses and, and their capabilities. And in one piece of traditional software engineering. So it's, it's AI and CUDA kernel optimization. AI and keeping the Kubernetes cluster alive, you know, AI and

[00:17:58] Philip Kiely: billing infrastructure. [00:18:00] Um, there's, there's gotta be some, a additional depth beyond just the, the AI stack in order to sort of match that domain expertise with the acceleration of the tools rather than just kind of everyone being a slop canon.

[00:18:16] Andrew Zigler: Everyone being a slop cannon, it's like obviously about doubling down on your specializations. Like if, if content or something else is not your particular strategy, that doesn't have to be part of your double T strategy. Um, and I also like too how you. You caught out the, importance of like the, before you get into the tees when you're talking about the generalist is stretching across, um, the importance of like, really, um, creating that, that solid highway across your whole organization so that those generalists that that designer can ship can really easily go through the pathways.

[00:18:47] Andrew Zigler: To do it. Um, that's like actually key to, because it's like, just because you have the skill doesn't mean you have the organizational, uh, throughput to execute on it. Uh, they're like, those are two different problems. Also [00:19:00] too, like when you're picking your specialization, I like how you called out, like it's usually like AI and something else.

[00:19:05] Andrew Zigler: AI and this AI and that. It's oftentimes these folks who are deep domain experts in something in a preexisting way or have accumulated that knowledge really quickly through working with other tools or methods. And then they pair that with obviously the ability to be age agentic and to. And distribute, um, those gains to other people.

[00:19:25] Andrew Zigler: I think like that's really the key unlock, uh, is being able to then distribute, like you said, the gains. Um, which is what then AI isn't enabling them to do. I'm curious from my own personal perspective because, you know, you wrote a book, Inference Engineering, and it's during a time when the industry is changing so, so much like here on Dev Interrupted, we have a hard time even keeping our news segments fresh from week to week from when we record them and then when we publish them.

[00:19:53] Andrew Zigler: 'cause the space moves so fast. So I, I, I wanna learn more about. Like how you went into writing Inference [00:20:00] Engineering, what folks can learn from it, and how you, how you went to set out to write a book during an era where like, you know, everything is changing so rapidly.

[00:20:08] Philip Kiely: You know, in, in marketing we have an idea of like evergreen content and, you know, publishing something on, on trees is definitely putting your, your flag in the ground and saying, Hey, this is gonna be some evergreen content. I was in a big rush to get this thing written and published because I felt like, okay, maybe this is gonna have a, a 12 to 18 month shelf life from, from when I put it out.

[00:20:33] Philip Kiely: I, I've, I've seen a few changes, obviously, you know, my book talks about Deepseek three when we're now on DeepSeek four. It talks about Kimi 2.5 instead of 2.7. Um, there's, there's, you know, things here and there that are, are updating. You know, my, my section on quantization doesn't have anything on quant.

[00:20:53] Philip Kiely: Um, there's of course progress in this industry, but I've been working at Baseten on [00:21:00] influence for more than four years now. And I've seen, you know, especially over the last year, the stack become a little bit more solid. You know, the, the foundation is there, uh, in terms of, you know, CUDA through PyTorch through your various options for influence engines like VLM SG link 10, so T LLM, you know, dynamo, and some of the orchestration stuff on top of that.

[00:21:26] Philip Kiely: We're seeing, you know, some standardization in the cloud providers and in the neo clouds and sort of the way that they present GPU compute, the way that compute can be purchased, allocated, scheduled, uh, you know, controlled, spun up and down. So, while there is a lot of change in this industry, we're getting to the point where many of the fundamentals are becoming settled at, at least when it comes to influence.

[00:21:52] Philip Kiely: I think that training, the training stack is a little bit less settled right now. The sort of agent orchestration stack or the [00:22:00] harness stack is, is very, very up in the air. There's still a lot of of questions about what that's gonna look like long term. Maybe that's kind of more like the JavaScript framework of the AI industry where there's just kind of like a new framework every, every couple months that people are switching to and, and eventually you're sort of like angulars and views and reacts of, of the industry sort of come out of it.

[00:22:22] Philip Kiely: But for inference particularly, there is a good deal of solidity in, in the fundamentals of this space. And I think that there's a lot of opportunity though still to sort of push the envelope on inference. Like when we publish blog posts, we're talking about being 20 or 40% faster, like 2 or 4x faster.

[00:22:43] Philip Kiely: Like if all the low hanging fruit was gone, we'd be talking about our 2% and our 4% optimization. That, that suggests to me that we need a lot more people working in this space. And so if you can make the fundamentals sort of broadly accessible and easy to learn all in one place, [00:23:00] then like engineers can come into this space.

[00:23:02] Philip Kiely: And I, I've seen people get up to speed in like weeks to months and start making sort of frontier contributions in, in inference because. Again, while like the fundamentals are, are, are pretty settled, there is a ton of work to do in the sort of optimization and the last mile delivery of inference to the vertical AI companies, to the new labs who are making their own custom models and to sort of the world of developers at large in the form of like faster, cheaper, better tokens.

[00:23:31] Andrew Zigler: You made two strategic bets and you, you were going back to the original idea at the beginning of planting the seed of, of getting it in the hands of, of folks. That way they can learn and they can have access to the information. And the second thing is, you know, you bet on primitives. Like what you said, you, there's some core underlying parts of of it that are solid that you can talk about that aren't going to be rapidly shifting without otherwise changing the entire definition of the space anyways, in which case, having to go through and update [00:24:00] the Kimi 2.5.

[00:24:00] Andrew Zigler: The 2.7 is just a custodial task compared to that. So you captured the primitives and made them accessible more broadly. You broke them down into a language that wasn't going to immediately get outdated. Going back to your thesis around it being evergreen and, and, and, you know, fighting to make it, uh, last as long as possible once it left your hands. I, I want to learn a little bit more too from you about how in within the book, how you think about equipping folks to work with inference as traditional software engineers, as people coming from other fields. Um, what are some of the things now we're kinda getting into maybe the content of, of the book a little bit.

[00:24:38] Andrew Zigler: Like what, what's like your biggest takeaway that you really want engineers who pick it up and open it to, to really be able to extract and run with?

[00:24:45] Philip Kiely: My thinking on the book is that like any given engineer with a bit of experience who picks it up is probably gonna know one or two chapters better than I do. If someone's in SRE and they read chapter seven about production. They're gonna be like, well, what is this influence for [00:25:00] babies? Uh, you know, if someone is an ML researcher and they read chapter two about the architecture of these models, they'll be like, yeah, bro, I get it.

[00:25:09] Philip Kiely: It's, no, it's, it's matrices. And you multiply them together like, we've been doing this for a decade. What's up? I think that the interesting part of inference is the absolute breadth of the stack where you have to go from reasoning about the architecture of, of a, a set of Noel Networks to thinking about the specs of your GPU to thinking about a entire stack of technologies from, from Cuda to to PyTorch to an influence engine, to a bunch of new research optimizations that you're applying in each of these technologies to like modifying that depending on your modality, and then shaping that on scalable infrastructure.

[00:25:45] Philip Kiely: There's, you know, distributed systems problems here, there's sort of competitive coding style algorithm optimization problems here. There's business and economics problems here. There's just a lot to think [00:26:00] about. And so my goal with the book was to, you know, take one of those engineers who has the deep expertise in in one part of the stack and expose them to the ideas of everything else that's going on and how all these pieces fit together.

[00:26:13] Philip Kiely: Because ultimately. There is no one silver bullet for inference. You can't just like get some GPUs, take a inference engine, slap the two together and be like, boom tokens and, and expect that to scale and to hit frontier performance to hit three or four nines of uptime. Um, so, so yeah. So that's, that's kind of the goal here is to deliver the message that influence is a stack.

[00:26:37] Philip Kiely: It's a complex problem. It's not just one thing.

[00:26:40] Andrew Zigler: So it's like the surface area, like you're saying, it's very broad. There's a lot generally, broadly, that you can teach and you can share and that you need to, and you're going in writing a book where, you know, that are reading it are gonna be an expert on you and some of these really broad domains because it's impossible to be an ex expert at somebody in all of them.

[00:26:59] Andrew Zigler: [00:27:00] And, and to capture that and, you know, that's really like a really kind of like, um. As a writer, that's like a, a sensitive and a hard place to put, put yourself into. But it's also really bold and it's a kind of like a, uh, the kind of the direction that I think a lot of folks need to do in sharing their expertise and, and, and their content is about betting on the primitives, presenting it in a way that makes it as broadly accessible for people within like the same problem space, but then also too, just really equipping them with the mindset reset that they need.

[00:27:28] Andrew Zigler: I, I think we, we talked about that week after week within our show about how a lot of times within engineering right now, uh, your biggest obstacle is your assumptions from yesterday, not the technology or holding in your hands, and being able to understand what is holding me back and what, what do I need to invent net new or to totally different questions to be asking.

[00:27:50] Andrew Zigler: And folks are typically asking only the what I need net new. And so it's about revisiting all of those fundamental prin uh, disciplines and principles and going [00:28:00] up to them and shaking that SRE on on the arm and being like, you know, like, I know you know this better than me, but there are some parts of this that are a little fresh that are still gonna probably be helpful for you to, for you to be reading.

[00:28:10] Philip Kiely: Yeah.

[00:28:10] Andrew Zigler: you know, just kind of like tying this back maybe a little bit more to Baseten as well, we talked about how your role is, is educating folks within the space and about making AI more broadly accessible. And that's all part of Baseten's mission of providing an inference platform. I'm, I'm curious like.

[00:28:26] Andrew Zigler: What is it like working for, a company that is working within like the, the inference platform space that's such like an on the edge kind of domain. Like what are some unique things that you see or that y'all work on, um, as a business that are, that are just really, um, might be kind of like really insightful to our audience who's more coming from a traditional engineering org that's making a transformation instead of being a more ai, uh, centered one.

[00:28:49] Philip Kiely: I think that one thing I've learned is everyone thinks about inference and the problem of inference differently. You know, one, one thing that's been pretty cool is, is actually hearing from [00:29:00] developers at companies like Nvidia and OpenAI and, and, and Google Gemini, like who are on inference teams, you know, reading and debating and engaging with this book.

[00:29:10] Philip Kiely: And I'm like, guys, like, why, why are you reading this? You are already experts on this topic, but what they tell me is, well, yeah, we know how we do it, but we wanna see how everyone else is thinking about it too. We want to get those outside perspectives. I've been very fortunate to see the perspective of hundreds of different companies in terms of the, the way that they think about inference.

[00:29:33] Philip Kiely: And one thing that we're increasingly seeing is this idea of an AI native company where inference is mission critical. What mission critical means is that it's in the revenue path. If your model goes down, your product goes down, you're not making money, you're using, your users are churning, they're, they're going over to something else.

[00:29:51] Philip Kiely: And that's a very different thing than a lot of companies maybe earlier in that transformation are seeing where [00:30:00] AI products are just kind of a bolt on to their existing platform rather than the actual foundation of the thing that they're building. I think that when companies make that transition from the add-on to the foundation, they start to experience a ton of pain.

[00:30:16] Philip Kiely: Like, wait, why is it so expensive now that I'm doing tokens for thousands or millions of people instead of tokens for a small group of beta users? And why is my app crashing in slow all the time? Because I'm sitting on top of some public model API that's got like 1 or 2 nines of uptime. Um. Or, you know, why is, why is everything routing through the very smartest and most expensive model in the world when a lot of my users are doing very, very basic tasks?

[00:30:46] Philip Kiely: So, you know, the, the things that work in this space at the pilot scale and at the early Scaling p place do not necessarily work when you're rolling out an AI platform to an entire [00:31:00] company of tens of thousands of developers or to a user base. That is, you know, America or the world. And it's, it's exciting to like, go in and, and solve a lot of these challenges, especially around like latency, cost and reliability.

[00:31:15] Philip Kiely: because those are challenges that engineers have been workingn on for decades. And every new technology wings a new round of, of questions there. And, you know, in, in this case, we actually do have a lot of tools to, to address these challenges. and it's really exciting to see these technologies like hitting a, a scale and a sort of penetration throughout the market much faster and at a much greater, greater scale than many previous generations of technology, you know, two or three years into their rollout

[00:31:47] Andrew Zigler: Really

[00:31:48] Philip Kiely: to say, like mobile or, or like pc, internet, like any of these previous things were not this deeply penetrated into the market like three years in.

[00:31:58] Andrew Zigler: Yes, which is what's [00:32:00] gonna be quite crazy about the transformation that I think is still ahead of us. And I really loved how you called out the, the difference between, uh, adoption and scale. Like there's a really like sea change in terms of your practices. how you actually do implement and, and, and roll it out.

[00:32:16] Andrew Zigler: And it makes a really great, you know, case for like, once it becomes mission critical about having the support and the need of the entire infrastructure, the entire platform behind it. And that's what Baseten is

[00:32:25] Philip Kiely: Yeah.

[00:32:26] Andrew Zigler: in this case. And actually it's too, it's funny, like. We, I, uh, just like, uh, very recently on, on one of my new segments here with my co-host, Ben, we were just talking about like, you know, the model router problem about how that's still kind of like, people haven't really, it hasn't clicked with a lot of folks, but also too, you see a lot of ship's products or people that are, use a tool and you're using an opus call to do something that could just be like a haiku.

[00:32:48] Andrew Zigler: Inference call, right? There's like levels of optimization and understanding which queries go to what level of intelligence model that I think a lot of folks are still really early on in understanding, uh, which has [00:33:00] massive implications for scale. So that was a really timely, um, thing that you mentioned there.

[00:33:03] Andrew Zigler: Something that we just recently talked about on the show. I love that.

[00:33:06] Philip Kiely: Yeah, I mean, everyone has an AI pilot, but certain organizations are really sort of accelerating into the cove here and others are not. And it's, it's interesting to see like who's kind of there and who isn't.

[00:33:23] Andrew Zigler: Exactly. Well, you know, we're coming at the end of our, of our chat here, Philip, and I want to thank you again so much for joining me here today. Um, where can folks go to learn more about you and, and check out your new book, Inference Engineering, which we talked about today, as well as Baseten, you know, where can they go?

[00:33:40] Philip Kiely: Yeah, so I, I try and make my stuff pretty easy to find. If you just Google Philip Kiely you Google Influence Engineering, um, you're gonna, you're gonna get there. The actual link is baseten.com/influence engineering. That's going to get you a place where you can download a free PDF free Pel of the book or, uh, pick up a paper copy, [00:34:00] which I try and do like more or less at cost.

[00:34:02] Philip Kiely: Um, I'm not making, not making any money on this. you can also find me on, um, Twitter and LinkedIn at Philip Kiely. Very consistent in my branding there. Um, and. Yeah, I'll be at AI Engineer Worlds Fail coming up here in a couple months. I try and really, you know, hit the conference. So hit the meetup. So at here in sf.

[00:34:23] Philip Kiely: So if you're out here, come say hi and let's talk about inference.

[00:34:27] Andrew Zigler: Amazing. Well, I'll say hi next time I'm up there as well. And to those listening, you know, definitely be sure to go check out the links. We're gonna share them in the show notes as well. And if you're not already following Dev Interrupted, wherever you're listening to us, definitely be sure to to do so and, and make sure you're checking out our newsletter.

[00:34:42] Andrew Zigler: Or that drops on LinkedIn and Substack, uh, where you can find, uh, follow up links and discussions as well as some of the news topics from late. as well, I would love to connect with any of y'all on LinkedIn. So please reach out to Philip or I and continue the conversation we're gonna be posting about this, uh, once it drops.

[00:34:56] Andrew Zigler: So Philip, thanks again for coming on the show. Uh, [00:35:00] and, uh, we're definitely gonna be, you know, chatting with you more in the future, uh, and, and continuing to follow all of the stuff that you do at Baseten. So thanks again.

[00:35:07] Philip Kiely: Awesome. Thank you so much for having me.

Your next listen