What can you learn from the scaling issues OpenAI experienced when Chat-GPT went viral? 

On this week’s episode, guest host Ben Lloyd Pearson is joined by Evan Morikawa, Engineering Manager at OpenAI. Join us for a first-hand look at the engineering challenges that came with Chat-GPT’s viral success, and the difficulties associated with scaling in response to the sudden platform popularity.

They also discuss misconceptions around generative AI, OpenAI’s reliance on GPUs to carry out their complex computations, the key role of APIs in their success, and some fascinating use cases they’ve seen implementing GPT-4. 

“It took a lot of tweaking to kind of find what the right utilization metrics were and then how to optimize those.

But for us, everything was framed in terms of every improvement that we have represents more users we could let onto the platform. As opposed to saying, ‘Oh it's driving down our margins or making it faster,’ we kept cost and latency relatively fixed and we got to get more users onto the system.” 

Episode Highlights: 

  • 2:57 When ChatGPT Was A Single Developer Facing API 
  • 4:50 ChatGPT’s Viral Launch and the Scaling Issues That Followed 
  • 9:37 Misconceptions About AI 
  • 12:36 Ideal Use Cases for Generative AI 
  • 17:17 GPUs: What are They and Why Does OpenAI Need So Many
  • 25:58 Scaling ChatGpt
  • 34:04 APIs Role in the Success of OpenAI
  • 38:40 Innovative Applications Using OpenAI’s Products 

Episode Transcript:

(Disclaimer: may contain unintentionally confusing, inaccurate and/or amusing transcription errors

Evan Morikawa: 0:00

it took a lot of tweaking to kind of find what the right utilization metrics were and then how to optimize those. And all of those were really critical to getting more out of it. But for us, everything was framed in terms of. Every improvement that we have represents more users we could let onto the platform. As opposed to say like, oh it's driving down our margins or making it faster, we kept Cost and latency as relatively fixed. And the thing that could get to move was we got to get more users onto the system. 

Ad Read 

What's the impact of AI generated code. How can you measure it? Join linear B and ThoughtWorks global head of AI software delivery. As we explore the metrics that measure the impact gen AI has on the software delivery process in is first of its kind workshop. We'll share new data insights from our gen AI impact report. Case studies into how teams are successfully leveraging gen AI impact measurements, including adoption benefits and risk metrics. Plus a live demo of how you can measure the impact of your gen AI initiative today. Register at linear bead at IO slash events. And join us January 25th, January 30th, or on demand to start measuring the ROI of your gen AI initiative today.

Ben Lloyd Pearson: 1:16

Hey everyone. Welcome back to Dev Interrupted. I'm Ben Lloyd Pearson, director of Developer Relations here at Linear B. I'm pleased to have Evan Morikawa joining us today. welcome to the show, Evan.

Evan Morikawa: 1:27

Awesome. Thank you.

Ben Lloyd Pearson: 1:28

Yeah, so full disclosure, Evan and I worked together in a past life at an API company. Uh, you may not remember this, but you were actually a big part of why I decided to join that company. What? I did not know that. Yeah. Wow. Yeah. You had saw your video on YouTube and, uh Oh, wow. Yeah. Yeah. So cool. Yeah, we've got a little bit of a history, so it's really great to to just have an opportunity to catch up with you, talk about some of new stuff that you're working on. I know per, you know, personally I was a little bit envious when I saw that you were going to open api. I, 'cause I was like, that sounds so cool, . And here we are talking about how cool your work is. So. let's kick it off. Yeah, I mean, I mean, opening eye, I feel like needs no introduction. I mean, I feel like everyone is talking about it. it's gone viral, at the center of the conversation around AI and LLMs. the release of ChatGPT has kicked off a global phenomenon. Uh, and I, I want to walk through that story, uh, in particular, you know, what it took for you to scale ChatGPT when you had that viral moment. and especially, I, I, I know I've heard a little bit that there was a shortage of GPUs that also affected this. I, I do want to dig into that and you know, I just wanna see, learn a little bit about how you've been flexible and nimble while this industry has just like rapidly shifted around you. Yep. So let's just kick it off. Where, where does this story all start for you? Like how did you get to where you are right now?

Evan Morikawa: 2:44

Yeah, so I mean, as you mentioned, I was working at an API company before. And at the time, that's all OpenAI had. In fact, that was the light bulb moment, was learning that three years ago or so. Now OpenAI was starting this apply to team. And Applied was all about bringing this crazy technology safely to market. And at the time, that was just a single developer facing API around the very first GPT 3 models. which is good, because up until that, like, it's still the case that I do not have a machine learning background. OpenAI was much more research lab focused. But then here now, it's starting up this brand new small team doing APIs and products. And I was like, ah, I've been doing APIs and products. Maybe I can help contribute here. I think there was also that feeling, too, that, you know, I feel like I had a reasonable sense of how computers worked by that point in my career, except for this. This still is this kind of magic, certainly at the beginning was this magic black box, like, okay, I'm gonna see what that's... All about as well. I also knew of like some of the, the, the founders through just the, the broader network and yeah, reached out and that was the start of it there. Uh, when I joined, applied was very small. There were only basically like half a dozen engineers working on all of the. APIs and systems for that. and that steadily grew as we were trying to work on iterating on these language models. I think the next big, the big really first release or push of any kind was when we tweaked these to write code and worked with GitHub to launch GitHub Copilot. So GitHub Copilot actually initially ran through our servers at launch, because it was very difficult to run or deploy these. That was definitely the first time we had any experience running this at any sort of scale. But still all the way through, basically up until ChatGPT. And still to this day, have this very core of an API business that powers all these other AI powered applications that a lot of people are trying to build on now. and then along came ChatGPT. It's actually kind of an interesting story, because when ChatGPT launched, it wasn't necessarily sure whether or not it'd be a scary thing or a totally normal thing. You know, at one, on one hand... The model that was powering it, GPT 3. 5. That had been out for several years at that point in various iterations. people could already sign up for free through the developer playground. And in fact, really noticed a lot of people like playing with the models in that kind of way. So, in some senses, there wasn't that much. Different about it, you know, maybe a new UI. Also the model had been improved dramatically to make up things less and be more conversational. But you know, on the flip side, this is also the first time we would ever be offering anything without a wait list. This would be a free to use application out there. And yeah, that definitely changes things as well. Um, you know, actually on launch day, I think it launched on a Wednesday. It was like November 30th. And we, kind of by design, sort of thought this would be a low key research preview. Just a blog post, a tweet, and nothing else. And actually on launch day, nothing crazy happened. Like some people came and used it and never passed number five on Hacker News. We had all the capacity we needed, traffic tapered off. We're like, great, quick little launch, like move on. You know, actually, at the time, we were preparing internally for the launch of GPT 4, which was coming up next, so this was, this was actually a way for us to experiment with a lot of the recent fine tuning that had gone into the older models to, like, make them safer and more conversational. It was the next day, or rather 4 a. m. the next morning, when our on call starts to get paged because traffic is starting to really rise. There was this graph, we were trying to figure out what was going on, and all the traffic was only coming from Japan. And we were very confused, we actually thought we were getting like DDoS'd or like attacked or something like that. but no, they had just woken up first, it was starting to virally spread through Twitter, and then yeah, and then by the time the morning of the East and the West Coast picked up, it was like very clear. That we were getting hammered here. unfortunately there, you know, normally you can just like throw more servers at the problem, but yes, there is like a very finite supply of GPUs here, so there's really not much we can do about it. The, we did have some contingency in place for this. The idea being that we could throw up a. Like, WeAreAtCapacityPage, this kind of like, bouncer model, if you will, right? Like, oh, the club is full, when some people leave, we can let more people in. Uh, that was actually explicitly done because we also did not want another waitlist. like, no one likes a waitlist, so we're like, oh, we can... Try it this way, but unfortunately that WeAreAtCapacity page was up a lot for the first while while we were trying to like scramble to get, to get more capacity online. And it also just like fixed the long tail of other stuff too. GPU capacity was definitely a dominant concern, but we also had all the other scaling problems. Kind of other everybody else in engineering has too.

Ben Lloyd Pearson: 8:19

Yeah. Wow. That's a fascinating story. And, and trust me, I remember those, uh, capacity pages quite well., so yeah. So we, I wanna talk about a, a little bit before we get into the GPU stuff and some of the scaling issues about this black box that is ai because it is really how it feels to a lot of people, and many of the productized versions of it. That's, that's really kind of how they, they perform, right? Yeah. And, you know, there's no shortage of content on the web that describes how generative AI and LLMs are going to do both wonderful and horrible things to all of us. And, you know, as this initial hype wave wears off, I think we're really starting to see like concrete use cases that truly bring a lot of value to people. And, you know, I'm thinking, you know, just from my personal experience, like my grammar checker software giving me more intelligent advice about my writing. And it can't do everything, but there's still some things that it just saves me so much time. so, you know, I think it would be really interesting to hear from an engineer that's actually building this stuff. Like, about the biggest misconceptions, the biggest misunderstandings that you've seen, related to all of this. Uh, and, and specifically, if there's like one or two things that you can clarify for the world. About AI, like what would those things be?

Evan Morikawa: 9:37

that's a good question. The, certainly one misconception is that it is definitely, certainly not as it exists today. This completely omnipotent system here, right? It is, there are quirks about how this thing works. It's actually quite helpful to remember somewhat how these things are trained. They're trained by predicting the next word, for all words and phrases on the internet. And that, though, is in itself deceptive, because this is much more than a Mad Lib system or an autocomplete engine. Because it has turned out that in order to be able to predict the next word, you kind of need to know a huge amount about society and structure and context and culture and things like that as well, too. But at the same time, they're also very steerable based on the context that you give it ahead of time. You know, when GPT 3 first came out, this was, in fact, the title of the paper is about few shot prompting. Few shot here basically refers to the idea that you just give the model a handful, like three or four examples of what you want it to do, and that kind of eggs it into the right... Direction. in some ways this is not too unfamiliar. I think, you know, if you go on Google, old Google at least, if you type some, if you type a question one way, you get Yahoo answers, but if you type it a different way, you get like a scientific paper. so kind of in the same way, you can steer it in the direction of things. the one thing that's a little weird about this, though, is this has left some very seemingly black box types of prompt engineering into, into this right now. Uh, there were some papers recently that have been getting a lot of press that simply... Say, if you ask in the prompt, literally take a deep breath and think step by step, it does much better on a large like category of tasks, that's something there feels . Wow. Like kind of wrong on one hand.

Ben Lloyd Pearson: 11:28

It feels very humanistic though, doesn't it?

Evan Morikawa: 11:30

Well, that's a, that's actually the great point. On the flip side though, if you kind of assume that the models are going to approach kind of a human type level of intelligence. It's worth to ask yourself if you throw, if you threw a relatively competent human in what you're asking it to do with as much context as you gave it, how would they perform? Mm-Hmm. and it's not unreasonable to think that these models kind of mimic that because they are mimicking human speech. We've seen it on the internet. so yeah, in, in fact, if you are, I actually think some of the people who are the, could be the best at prompt engineering are like teachers, engineering managers, tutors, people who are used to like asking the right questions and setting up the right context for people to like, help them arrive at the right conclusions. Uh, and if you kind of think about it that way, you get like weirdly better results. across there.

Ben Lloyd Pearson: 12:24

Yeah. And I think you're, you're actually partially answering my next question. So, you know, in your, in your opinion, like what are the situations that are ideal for generative, generative ai? Yeah. And alternatively, like when would you steer someone away from it as a solution?

Evan Morikawa: 12:36

Yeah, yeah, yeah. some places I think has absolutely been helpful. I really just started to tap at, right now certainly software engineering, coding, boilerplate type things. That was the first place that we internally really dogfooded any of this was when we ourselves Started to use Copilot as an educational tool. I think that is still, despite how much has been talked about, still an underrated, undertapped ability here. The idea of a personalized tutor everywhere you go. Like, TAs, think about university, the professor can talk at you all day, but it was the TAs where I really learned things, and those follow up sessions, because you could ask follow up questions. And you can frame it in a way that makes sense to you. that kind of iterative learning, I mean this is where I personally use it the most. Like the thought of reading any paper without this thing on the side, or without being able to like, just being able to like ask it for concrete examples of things. Being able to rephrase and reword and take follow up questions. That I think is going to be a huge deal. there are some places, on the flip side of this, This, though, uh, yeah, like it still has, we have not in any way, shape, or form solved this, like, perfectly verifiable problem. This should not be your end state for medical advice on a huge slew of the topics right now. It should not be the thing that is, you are trying to use to cite case law for your own trial, for example.

Ben Lloyd Pearson: 14:03

I think a lot of people call this hallucination.

Evan Morikawa: 14:05

That's right. Now, on the flip side, there's actually law. I think it's a really interesting area as well, too. Especially some of the side effects of this, like the power of the embedding models that we have. you know, it is very good at saying what patents are similar to this one. What cases are similar to this one. And in ways that are much more than just do they have similar keywords. But the fact that these models deeply semantically understand what's going on, it can help you find and search like that. Like that's, yeah, those types of search abilities will get dramatically more powerful.

Ben Lloyd Pearson: 14:40

Yeah, and I know personally, one area that I've really found a lot of benefit is I have to very quickly understand a lot of new technologies as a part of my day and work with things like, you know, I don't, I don't, I haven't done a lot of Regex in the past, but I find myself doing a lot of it today. And just getting that like intermediate understanding, like immediately without having to like parse through a bunch of resources across the web. Like Yep. I, I mean, I can't even, I can't even add up the number of hours that that saved me. Yep. You know, so Yeah. It's really great to do that are

Evan Morikawa: 15:09

probably the best example of that Yeah. Yeah.

Ben Lloyd Pearson: 15:12

Yeah. Yeah. That's been wild. It is actually kind of blown my minds at how quickly, you know, 'cause I mean, reg Xs is like, is not complicated, but it can be time consuming if you don't work in it frequently. Yep. You know, so. Yep, yep. Yep. So then as you know, as an engineering manager, like, like what expectations are you setting with your team in regards to the use of generative ai? So, you know, it sounds like you're an early dog fooder of copilot. you know, is, is, is that, are you are as an organization or are you like, systematically adopting tools like that and you know, what are the, like changes that you've seen?

Evan Morikawa: 15:42

yeah. So getting this really. We definitely want to more and more have this help us be productive. I mean the, we actually have a research team called the AI scientist team, which is very much long term about being able to make this, this work. at the same time though, the There's a pretty wide gap between what works just straight off the bat from the prompt in ChatGPT and like an actual tool you'll use day to day. Like, yeah, some people did a quick plug in to like hack in before VS Code, but I mean it still takes a lot of product work to put the whole experience together and make it work really nicely. I think this kind of immediate generation of like developer productivity tools, it will take a fairly large investment to... Like, really put it naturally into a workflow. That's actually why I'm very optimistic about the coexistence of both a tool like ChatGPT and the API ecosystem. like, ChatGPT is about, we think it can be useful in lots of different places. In a much more kind of generic sense, but there are also a huge number of industries where like being in flow matters a huge amount. Developer tools, law, right? Medical systems, like all these other places, you'd also need and want a lot of these like. integrated applications as well too. but yeah, like we absolutely believe this will make us and everybody else substantially more productive um,, over time. For sure.

Ben Lloyd Pearson: 17:12

cool. So let's pivot a little bit to talk about what I think is gonna be the most unique aspect of this discussion. So, you know, in recent years, there's been something that has brought gamers. Crypto enthusiasts and LLM experts together. And that is the frustration over this shortage of GPUs. Right?, I tried to upgrade my P PC a couple years ago, . So, you know, before we get into that shortage, I, I think it, it would be beneficial to just step back for a moment. Yeah. So can you describe like the technical role that GPUs play, in the open AI tech stack, uh, and you know, why are they in so important and how do you all use them?

Evan Morikawa: 17:46

Yeah, absolutely. So at the end of the day, when you ask Chachi PT a question, it's taking your text and it's doing a huge amount of matrix multiplication to kind of predict the next word, right? That's what all these hundreds of billions of model weights are for. And at the end of the day, we're basically doing one math operation, multiplying and adding a lot, which, like we're talking like quadrillions and quadrillions of operations a second here to do this. So, the ability for a GPU to be able to GPUs are just many orders of magnitude faster here. For a sense of scale, the latest GPU that will be running this NVIDIA H100 that everyone's been trying to get, uh, that can do about 2 quadrillion floating point operations per second. your laptop CPU probably can do on the order of a couple hundred billion. So we're talking like thousands of times difference in speed here. So yes, you can run these things on CPUs, but the performance difference we're talking about is 100x. So it's, especially for models this size, it's like really important to do that. The other thing that's significant is that the models are so large, they don't fit on just one GPU. We need to put them on multiple different GPUs. That's actually where things jump dramatically in complexity. Much like the rest of this world knows that yeah, your simple single-threaded application makes sense, but once you run it massively concurrently on a, on a globally distributed system, that's where the hard problems come from. Similarly here, once you run these on multiple GPUs, things get a lot more difficult. now you really care about how fast your memory bandwidth is. You really care about how fast your interconnect, your like network bandwidth is between GPUs, between boxes. and it's gotten to the point now where every single one of those... These metrics can become a bottleneck at various points in the development cycle. So we really care about all of them, and we really maximize them. And anytime there's a new or faster interconnect, that usually almost directly translates to improved speed for chat GPT. That directly translates, if you can make it run twice as fast, that's twice as many users as can access it on the same hardware. Or that's twice as large of a model as you can run. on the same hardware today. so these really make a, make a huge difference in, in what it's capable of going forward.

Ben Lloyd Pearson: 20:19

Nice, so I think that describes pretty well the impact that, you know, a shortage of GPUs would have on your company. So, beyond a page that says, hey, we're really busy right now, come back later, what other strategies did you all take to adapt to the sudden influx and the lack of this hardware resource that you need.

Evan Morikawa: 20:39

Have to do a little bit of everything. We were working very closely with Microsoft, who subsequently was working closely with NVIDIA on this to build that capacity here, but also it was about making the most of the resources that we had. So optimizations were hugely important here. And this is the long tail of sort of your classic put it through a out the parts that are slow, optimize those. but for us, those optimizations are all across the stack. It's from low level cuda kernel compiler optimizations to sort of more business logic. How are we batching requests together? How are we maximally utilizing these, uh, these things? it took a lot of exploration, like we discovered that we, we started with a very basic GPU utilization metric, you know? From, from whatever the Nvidia box bits out, we found that it was actually misleading because we were not like, it could be doing more math when the same time it was on, or we were actually running outta memory instead. So it took a lot of tweaking to kind of find what the right utilization metrics were and then how to optimize those. And all of those were really critical to getting more out of it. But for us, everything was framed in terms of. Every improvement that we have represents more users we could let onto the platform. As opposed to say like, oh it's driving down our margins or making it faster, we kept Cost and latency as relatively fixed. And the thing that could get to move was we got to get more users onto the system.

Ben Lloyd Pearson: 22:17

Gotcha, gotcha. And you know, I've always kind of felt that that, GPUs were chosen for LLMs, mostly outta convenience because it was the hardware that was available at the time that was most closely adapted to the, the needs of that Yeah. Community. So are you looking at other hardware options out there? Like are there things that are coming up in the market that you think have potential to replace GPUs, specifically for generative AI?

Evan Morikawa: 22:41

Yeah, well, so even though we all call them GPUs, like these are not the graphics processing units of your desktop PCs anymore. Uh, machines, and especially with Google calling theirs TPUs, the kind of, this is why it's sort of like AI accelerators, more the generic trend now, but I still call them GPUs. I mean at this point they are hyper specialized to do this exact one type of matrix operation. Um, another, another concrete example here that they've been optimized only for AI things is doing lower precision math. Most people, when they have a floating point number, you get 32 bits to preserve it. We do math with, you can do math with 16 bits, so with 8 bits. which means you can just like, do more at the same time. And now they have dedicated circuits to do that on the upcoming GPUs that are coming out. So in a lot of ways they are super specialized. the other thing though is the software stack. NVIDIA has, their CUDA stack, their software stack, the kind of compiler layer on top of there has been hugely specialized to that. It's currently very difficult for people to use AMD or Intel or the other manufacturers out there. Actually, there is a product, OpenAI Triton, which is explicitly designed to try and better abstract that. and that's potentially a huge deal because the ability to use other hardware much more easily is definitely a big thing, uh, will be a big thing for this market. But right now, yeah, Nvidia has an incredible hold on this. It's reflected in their share price right now. Yeah. Um, a lot of it is because they own the hardware, the software stack, and a lot of the interconnects. for example, we, we use InfiniBand, which is a like ultra high. Bandwidth interconnect. That company that developed at Melanox is also owned by Nvidia. So they like really have the stack top to bottom here.

Ben Lloyd Pearson: 24:49

Yeah, they, they definitely got in early.'cause I, I remember playing around with LLM tools, uh, years ago and like Nvidia was the only option in the market. There really wasn't anything else to, to look at. Yeah. And I remember Cuda in particular, that was around the time that it was really taken off and, yep, yep.

Evan Morikawa: 25:04

Yeah. I actually should note though, these are. It's been very, very difficult for the chip manufacturers to even, like, get the right chips, though. another example, despite NVIDIA's dominance here, their upcoming chip, this H100, is kind of widely known, well, within this specific subset of the industry here, that it has, it doesn't have enough memory bandwidth relative to how much compute they added into it. So it's very, it's getting increasingly difficult to utilize this. But the reason that happened is because they didn't know about how large the models were going to get. They didn't know, it was very difficult to predict this on the scale of like chip development cycles. So, right, ChatGPT is less than a year old. And one year in semiconductor manufacturing design is nothing, so. Yeah, it's very difficult for anybody to predict, uh, to do this.

Ben Lloyd Pearson: 25:58

so I wanna, I wanna transition a little bit into talking about how you approach scaling, uh, the engineering function at Open the Eye as this was going on. So, you know, the rapid success, no secret at this point. your, your leadership has been very open about sharing the challenges of this, this rapid sudden virality. And, you know, it's one thing to deal with sustained growth over a long period, but when you, when you deal with, like, this sudden surge, it's, it's an entirely different beast. Cause, I mean, not only do you have to deal with potentially much higher peaks, but you don't know how much of that is going to stick around for the long term, right? So, you know, walk me through how that played out for your engineering organization.

Evan Morikawa: 26:38

Yeah. so for instance, staying nimble has been a huge piece of this. One thing that is like. very much helped in by design is doing everything we can to try and treat everything like a tiny early stage startup. this originally was true in terms of raw, like, head count here. Yes, there are a huge number of people that contributed to the research and the model training, but at the end of the day, the kind of product engineering, design, like, parts that is applied was only several dozen people when the chat GPT launched. So it was still like a much smaller group. Even still, we intentionally set up ChatGPT as a more vertically integrated sort of separate product team within Applied. If you think of Applied and the API as this three year old startup, ChatGPT looks, feels and acts like a ten month old startup. And concretely that was in the form of we intentionally started on a separate repo, separate clusters, different controls, taking on a little bit of that. kind of tech debt and duplication at the start to like really like optimize for iteration there. Whereas gradually the API started to optimize a little bit more for like stability and SLAs and stuff like that too. Now this is of course changing, like ChatGPT also has like huge stability and SLA concerns. We are kind of working to build out more broad platform teams as well. Fast iterating was important. The other kind of key piece about this was having the research teams deeply embedded here. So, while I talk about Applied, because that's the group I'm in, in reality, ChatGPT effort heavily involved a huge chunk of researchers from various research teams. They were the ones who were actually constantly tweaking and updating and fine tuning. The models based on end user feedback as well here too. So keeping these as very vertically integrated, like teams with both product engineering design and research was also super important.

Ben Lloyd Pearson: 28:44

Yeah, so what would you say is like the biggest improvement that has come out of this from an engineering perspective, from you know, just dealing with all of this scale?

Evan Morikawa: 28:53

the biggest improvement that's come out, actually our ability to work together as like a single research team. Product group, there was an early fear that it would be like the worst case scenario for us would be the type of place where research trains a model, throws it over the wall, go productize it. And it was like this one way street. And we spent a lot of effort making sure that that was not how we developed anything. But, you know, that was all like abstract until you like are actually in the trenches developing. And like really working on a product here. and of course the reality is about like, just like it, it's very messy to begin with. You just kind of like have to tweak it as it goes along. But now that there's a much stronger focus around sort of these clear products we have with this clear API product that we need to build, we have this clear consumer app that we're focusing for. I think like that has helped a lot. Really integrate the research and product and engineering and design and kind of this one uh,

Ben Lloyd Pearson: 29:54

push. And, and I imagine that probably gives your engineers an opportunity to learn more about how this stuff is created, right?

Evan Morikawa: 30:00

Yes. it has definitely been necessary. So it has been the case actually that everybody in applied and engineering did not need to have a machine learning background to do this. I do not have a PhD in machine learning. but that's, that's fine for now. Uh, certainly the, the interest and the, to pick up and learn a lot of things along the way is important, but. Yeah, a lot of our, most of our problems are product problems. They're engineering problems. They're distributed systems problems. They're kind of classic like that, but at the same time, it has been really important for everybody to at least get a reasonable understanding of how all these models fit together. Because a lot of the engineering considerations are deeply tied with the way these are structured, the hardware that we're using, the way things are deployed. Those all definitely matter.

Ben Lloyd Pearson: 30:49

Yeah, this might be a tough question to answer, but if you could go back in time to Evan a year ago, and tell him, hey, your product is going to go viral someday, are there any changes that you would have made in your approach to respond, or to anticipate

Evan Morikawa: 31:05

that? you know, maybe not, because I, I'm inherently a bit of a skeptic when it comes to things, like, I would not have believed... My thing would go viral. Also because I believe in not prematurely optimizing the system. We have this deeply iterative model baked into here. So I actually would have been afraid that we would have spent this huge amount of time making sure the infrastructure was load tested up the wazoo, making sure the product was perfect before it even got there.

Ben Lloyd Pearson: 31:44

So you're saying just swing for the fences and deal with what happens after the fact.

Evan Morikawa: 31:49

Now, I should note, though, that iterating a product quickly does not, especially here, it's very important that that does not compromise the kind of safety mission that we have to begin with as well. Like safety, our safe, not being happy enough with our safety systems, with the red teaming results that we're getting, that is the primary thing that will delay launches. That is the non-negotiable before we can ship something. Yeah. so that, that is the place that we can, should, and would like spend even more time iterating on. But here again, a lot of the philosophy around this is that, we kind of see the. Safety systems are the red teaming layers coming in layers. Like we have a very active, network of experienced red teamers who will go in and test stuff. We have very controlled rollouts through various stages to try and like catch everything. But it's still the case that there's no way to catch all of it until you actually get it out into the world.

Ben Lloyd Pearson: 32:57


Evan Morikawa: 33:01

not, yeah. Having a lot of people thinking and working and doing this is a big part of what it takes to make these things move forward.

Ben Lloyd Pearson: 33:14

So, I got one I want to talk about and it kind of brings us back full circle to how this conversation started and that's APIs. So, you know, we mentioned that, the part of generative API that really excites me is seeing it pop up in all the tools that I use everyday. you know, in, at LinearB, you know, we've trained ChatGPT to use multiple aspects of our platform. So we can do things like ask it questions, it can write configuration files for us that are highly specialized. And, uh, you know, that's, like, to me, that's, like, the real, like, fascinating use cases. but, you know, obviously for that to happen, there's got to be a really strong API for developers to, to build functionality on top of it. And, you know, I think since we both worked at API companies, we understand that, you know, it really comes down to making sure the API is performant, and that it's built for real world use cases rather than theoretical situations. what role generally speaking, do you think APIs or, or the APIs for your products are gonna play in the success of Open AI in ChatGPT?

Evan Morikawa: 34:12

Yeah. The, there is absolutely no way, despite how I think we have a very talented team. There is no way our one team can out-innovate the vast, some of all of the creativity and startup energy and like company focuses that it has on making this work right now, as I mentioned earlier, like the. Ability to have this, to have AI deeply integrated with everything you're doing somewhat seamlessly and transparently, that's where a lot of real power is. One of the best places for us to sort of try out new ideas first.

Ben Lloyd Pearson: 34:55

That's actually a great point, because I wanted to ask, like what kind of feedback are you getting from developers in the field?

Evan Morikawa: 34:59

That's one of the most important pieces about it. Consumer apps, it's very difficult to get feedback on unless you start doing really aggregate stuff. But yeah, if you sit down and you like talk with the developers building on the API, we really learn where the actual problems are in the system. The gap between some a cool demo and something that's useful is massive. And that is true here. That is true in every industry. It is especially true here, esp, like the more hype that there is, the further that gap becomes and that only, that gap can only be closed when you're like really working with companies and developers like figuring out the hard way and learning all the quirks of it. It is definitely the case that the API ecosystem has discovered far more quirks of the models than our own research teams have.

Ben Lloyd Pearson: 35:45

so what, what are some of the engineering challenges that have been unique to the API versus the, the more general purpose tools?

Evan Morikawa: 35:51

we have lots of similar challenges to other API companies, you know, like database connection limits, networking shenanigans. the GPU constraints though, I would say those are, those are quite different. We've needed to jump immediately to tons of clusters all over the world. That was mostly done because we were chasing GPUs wherever we could have. So we are like a, we suddenly found ourselves multi cluster, multi region. On the flip side though, we've also spent a fairly large amount of effort making it such that there really aren't that many special unique things about Our deployment stack. we've been using stock Azure Kubernetes service, we use a lot, we use like Datadog, we use a lot of just like off the shelf tools. I think that's actually been really important, that's really helped our development team stay small, it's meant that new hires come in, kind of know what they're doing. the more we can do to try and treat things as a Just another service that takes text and spits it out the other side, you know, a cube service is a cube service. People know how to deal with that. It's a, at least it's a known unknown. I'm trying to, like, minimize the unknown unknowns here. but, yeah, at the same time, the scaling characteristics of this are very strange. I was mentioning that when, yeah, talking about these, like, utilization metrics were hard to figure out, initially there too. And the scaling challenges of this, I think, are also going to get pretty, pretty nuts. Like, we... the, the, the ambitions for the scale of capacity that we need to ramp up to, it's both usage going up, the models getting bigger, the models having all these different types of modalities for them, yeah, that's all just getting started right now. yeah, like we found ourselves suddenly When Dali came out, we're like, oh, we're suddenly not dealing with text anymore. Now it's all, everything that is image processing and image generation, now we're in audio as well with all speech in and speech out. Um, so, so yeah, the, the, it's become a lot more complex.

Ben Lloyd Pearson: 37:57

Awesome. They're all very fascinating. I'm really happy we got to, to learn about all this from, from inside the organization. And I have a couple of just quick questions, because they're things that I think our audience is really going to be interested in. The first is, what is the most interesting or useful adaptation you've seen so far of one of your products?

Evan Morikawa: 38:17

So, I think this is still emerging yet, but like, there's a lot of work of people trying to build longer form agents right now on the system. That's been a huge focus of a lot of startup activity. As I mentioned, a lot of the like, law applications I think are very, very, very, very like, have a huge potential to feel like that industry should feel very different.

Ben Lloyd Pearson: 38:41

I would love to have a lawyer in my pocket, just saying.

Evan Morikawa: 38:43

The other one of course is like the education side of things. We have a real effort to figure out how to use these tools. To me it is kind of analogous to, I don't know, math classes had to do something different when my calculator showed up everywhere. but we will, figuring that out and sort of like, getting in a world where people start to use these as tools and can like, help them be just dramatically more impactful at things, I think that's a huge deal. I actually really liked Spotify's feature they launched recently, which was using, text to speech. So they release these podcasts. In other languages, with the voice of the original podcasters. So yes, you can like listen to Lex Freeman in Spanish, but it is clearly his voice. And if you think about how much work or money it used to take to dub something, or to hire somebody to do that, like, that's insane. the other partnership I think is really cool is Be My Eyes. They've been using the GPT 4 with vision capabilities to help visually impaired people. Take a photo of your closet. Like, what, what should I be wearing today? yeah, that's, that's nuts. You can, you gonna have a device in your pocket that can deeply meaningfully and semantically describe what you're looking at. And if you're a visually impaired person, that's, yeah, that's obviously, that's massives a huge deal.

Ben Lloyd Pearson: 40:13

awesome. Uh, well it is been really great learning, uh, the inside perspective from you on OpenAI and, and all the engineering work that you're doing over there. Uh, if people wanna learn more about you or the work that you're doing, where's the best place to send them?

Evan Morikawa: 40:26

yeah, so I'm, I am E zero M on Twitter, and actually, I, I actually point including a lot of our new hires to open AI's blog, which both is a mix of like the product and research releases too, but kind of that whole arc gives a pretty good view of the. Kind of the state of what we're doing, but also state of the kind of the industry too.

Ben Lloyd Pearson: 40:47

Awesome. And I know personally, I've been silently watching your LinkedIn too and seeing a lot of your posts about all the cool stuff that's happening. So, so yeah. Well, it was really great having you here today. I'm glad we got a chance to catch up and talk about what you're doing. So thanks for showing up.

Evan Morikawa: 41:01

Likewise. Thank you.