Podcast
/
The best model for your team? You haven’t invented it yet. | Ai2’s Tim Dettmers

The best model for your team? You haven’t invented it yet. | Ai2’s Tim Dettmers

By Tim Dettmers
|
Blog_Comprehensive_DORA_Guide_2400x1256_41_3dcdd0799d

Forget the massive GPU clusters. According to Tim Dettmers, research scientist at Ai2, you can build a state-of-the-art AI coding agent with what he calls a "hot plate and a frying pan." This week on Dev Interrupted, Andrew sits down with Tim to unpack how his resource-strapped team built the SERA model using a fraction of the compute power of major labs. They explore the tactical engineering behind synthesizing training data from private codebases without verification tests, proving that the open-source community is uniquely positioned to out-specialize frontier models. Finally, Tim shares his contrarian take on the future of token economics, explaining why the cost of AI might actually spike as compute efficiency hits a physical wall.

Show Notes

  • Building SERA: Read Tim's deep dive into his journey of building the Soft-Verified Efficient Repository Agents.
  • SERA-14B on Hugging Face: Explore the open-source coding agent repository and documentation.
  • Why AGI Will Not Happen: Read Tim's contrarian blog post regarding the physical realities of computation and superintelligence.
  • Use Agents or Be Left Behind: Tim's personal guide to cutting through the hype and automating your own work.
  • Tim Dettmers' Blog: Follow Tim's ongoing writing and research on deep learning and AI systems.
  • Mistral Small&Qwen: Learn more about the small open-weight models mentioned as benchmarks in the episode.

Transcript 

(Disclaimer: may contain unintentionally confusing, inaccurate and/or amusing transcription errors)

[00:00:00] Andrew Zigler: Tim, I wanted to ask, uh. Just kind of like, before we start diving into today, like I'm curious from a personal perspective, what is it like to sit in this kind of like swivel chair between academic and industry?

[00:00:12] Andrew Zigler: Like what kind of perspective, like has that been for you and like, you know, what makes you most excited about being in that, in that place?

[00:00:20] Tim Dettmers: Yeah. I mean, uh, sort of industry and academia, the divide feels quite extreme. Um, that feels to be also sort of a one divide sort of resources. I think there's also the divide into perspectives, and then it's also, if you look at locations, like most of AI happens in the Bay Area, and it's just, if you're in a Bay Area, like in this bubble, everything's super exciting.

[00:00:41] Tim Dettmers: Everything goes super fast and that sort of thing. And some of it is true, but some of it is not. And so as academic, you can sort of lean back and sort of take it in. Instead of make up your own mind. And um, with that you can actually carve out bits where you say, I can be competitive in this field, [00:01:00] even if they have all the resources, but for pursuing sort of particular ideas and that they might not be able to pursue.

[00:01:06] Tim Dettmers: And so that is sort of an interesting space. But the other sort of is just this trade off between, uh, working with the most resources and being sort of a cog in a wheel or, being sort of, uh, resource staff, but trying to make most of it. And, uh, but you have all the freedom and you are working on big pieces that, you know, if I bring this out, that would be sort of my doing instead of being like a tiny part, uh, piece in a big team.

[00:01:37] Andrew Zigler: I, I love that framing, that you prefer working in this almost like resource strapped environment where you're able to fully explore the, the realm of thought, and then not only explore it, but then figure out the methodologies and get there. And whereas in this other environment where it's like resource resourceful and there's like so much that you can be pulling on and with each other, but the incentives are different, that's like [00:02:00] a, it's a capitalism minded machine, what you're trying to drive as a production and an output.

[00:02:04] Andrew Zigler: And so like you said, the perspectives are so. Different. And you are after the quest of like, what is best, what are the foundational, uh, elements of this that we all need to understand? And that's like the really important gritty work that maybe sometimes folks that are more industry minded, like, don't wanna pay much attention to.

[00:02:22] Andrew Zigler: Right. And so it can be like a little difficult there.

[00:02:25] Tim Dettmers: Yeah, maybe also a small story there that might be sort of quite interesting.

[00:02:30] Andrew Zigler: Yeah.

[00:02:30] Tim Dettmers: is, um, doing my PhD, I was doing like a sort of long-term internship at Meta. They had a lot of GPU resources and it's like the dream of a PhD. Students have like. Hundreds of GPUs. So I had these hundreds of GPUs and I were just running experiments and experiments and experiments, and I was making sort of progress, but not as much progress as I wanted to. at some point it was time to go back to the University of Washington, had much less resources, but now each experiments need to be very carefully chosen, very carefully [00:03:00] analyzed, and there I actually made some discoveries that I wouldn't have made. just look at experimental results now, I dove deep into the data, found some curious things, and I was actually more productive.

[00:03:10] Tim Dettmers: So more resources doesn't necessarily mean better results. Um, you can make more insights with less resources if you dive deeper.

[00:03:18] Andrew Zigler: In fact, when I think you have to be resourceful, that's when you're pushed to make those really in ingenious kinds of discoveries. And it challenges all of us to strip away complexity instead of adding it on. so really great takeaway and I'm, I'm really excited to explore all of this in our conversation, uh, because folks.

[00:03:35] Andrew Zigler: Today my guest on the show is Tim Dettmers, a research scientist at AI two and or an assistant professor at Carnegie Mellon who has built a career about finding the signal and the noise of high performance computing. And, and like we just talked about, while larger companies and industries might have

[00:03:52] Andrew Zigler: huge resources that cook up massive foundation models that can solve wide domain industry problems. [00:04:00] Tim and his small team recently built SERA, a state-of-the-art coding agent through what he calls a hot plate and a frying pan, like the complete opposite of being so full of those resources and working with a strappy team and strappy GPUs, and we're talking about the tactical engineering behind that.

[00:04:17] Andrew Zigler: Today, the automation muscles that make it possible, but also the groundbreaking research that we've covered from Tim here on the show about how this type of breakthrough can allow more teams to return to and embrace specialized models of their own and not necessarily be behold into foundational off the shelf tools.

[00:04:37] Andrew Zigler: So we have a big conversation to dive into today. It's very academic minded, super excited for it. And Tim, welcome to Dev Interrupted.

[00:04:46] Tim Dettmers: Yeah. Thank you so, uh, so much for having me.

[00:04:48] Andrew Zigler: Of course. So I wanna start at the top with just the analog analogy I gave of a hot plate and a frying pan from your recent report about how you and your team, uh, created. SERA, you know, for [00:05:00] our audience, maybe we'll start at the top and you could walk us through a bit of what SERA is and what it is able to you think unlock for engineering teams, uh, using AI coding tools.

[00:05:10] Tim Dettmers: Yeah. Yeah. So the big problem here is um, there are like awesome coding agents out there. Can we reproduce and then sort of academia. And so, um, if you look at these coding agents, it seemingly means endless of resources. Like, um, if you look at good coding agents. Usually only the big companies can produce 'em.

[00:05:31] Tim Dettmers: And so the question is, can we produce them sort of with less resources, an academic setting or um, AI tool, the Element Institute for ai, we are sort of a mix between sort of industry and academia. And the main goal that we have is both open source model, open source systems, bring all information out there so people can replicate things. And so the question was, can we replicate coding agent performance with. Uh, few resources, then basically give it to everyone. [00:06:00] That was sort of the main goal. And, um. Yeah, if you look at sort of, uh, industry, they have like these sort of industrial kitchens, um, that are just lots and lots of people and lots and lots sort of machines, GPUs and everything ties together sort of big reinforcement learning infrastructure. And we didn't have that. We had just couple of people sort of working on this project. We started out with 32 GPUs, so that's the comparison between like a hot plate or a frying pan that we had against the industrial kitchen that sort of these big companies have. And so yeah, that, basically pushed us in this resourceful domain to try sort of as efficiently as possible, get good performance.

[00:06:45] Tim Dettmers: We made certain trade off fine tuning instead of reinforcement burning. We, um, they're very sort of careful with developing a synthetic data, GE generation procedure that is highly efficient, very cost efficient. And then in again, [00:07:00] yeah, if you put it together with the right sort of pieces, we got extremely good results.

[00:07:04] Tim Dettmers: We could replicate small, basically closed source models like Mistral small or QU three coder. yeah, so that, that was quite successful despite the re um, resource bottleneck because that resource.

[00:07:19] Andrew Zigler: So to understand more clearly the ideas that you were able to get almost a comparable climate of performance in a, in like a specialized or a preexisting, or what we call like a brownfield code base, right? Using a tool that's trained on specialized like synthetic data from that code base. And, and you know, this is a realm that is familiar with folks who are, you know, working with these tools and have explored things like fine tuning and have.

[00:07:43] Andrew Zigler: Looked at how do I create these large data sets? And, uh, you know, part of your research is about the methodology that goes into creating that data set and how you could actually use something that's resourceful and low compute in order to generate that kind of data. And, [00:08:00] and so, uh, I, I, I kind of wanna, before we kind of talk a little bit more about.

[00:08:03] Andrew Zigler: That I, I wanna ask how you think this challenges the assumptions that engineers and engineering leaders have had up until this point that you know, oh, if I want really great top of the line competitive performance on my engineering team and their output, I have to use the OpenAIs and the Anthropics of the world.

[00:08:21] Andrew Zigler: How does that challenge that assumption?

[00:08:24] Tim Dettmers: Yeah. Yeah. I mean the main assumption is basically you need what the big labs have. What the big labs have is they have lots and lots of sort of environments, and here environment is basically playground for an agent for a particular problem where they can do all kinds of things. And then what do you want to have this sort of large scale reinforcement learning infrastructure where you. Reward the agent if it does something right in the environment. And, um, because you don't have real training data, you don't have human data. How, for example, human programmers program has very little data out of that, you gen [00:09:00] generate synthetic data. So you look at how the agent does it, put in dataset, then train a new agent that all of the data. And so the assumption was like, that's what you need. Everybody was trying to do it. But yeah, what we found is, um. can be very precise in terms of how you generate the synthetic data and um, basically how you interact with the environment. um, with that can sort of scale very easily and that sort of challenges that you need all this infrastructure, you need all the GPUs, you need all this complexity.

[00:09:33] Tim Dettmers: So we broke it down to the sort of most simplest component. And even in the beginning we made it simple, but then we made it simpler and simpler and simpler. They made it also more efficient.

[00:09:43] Andrew Zigler: I love that you call out, that you make it simpler and simpler and simpler. That's something that we call out a lot on the show about how a lot of times unlocking that velocity, but also that just like raw performance from an, from an agent or from an LLM can [00:10:00] be achieved by taking away as much as possible and creating really efficient closed systems that both you and the model can perfectly understand at any given point.

[00:10:10] Andrew Zigler: But it's still like a, doing some kind of knowledge work, some kind of transformation, right? And if you kind of create that shared thinking space, uh, you can get some really effective kind of, uh, outputs. Yeah.

[00:10:23] Tim Dettmers: That's right. Yeah.

[00:10:25] Andrew Zigler: I also wanna know about this type of tool that a, a team might explore creating. Let's say they have a preexisting code base and maybe it's very specialized.

[00:10:35] Andrew Zigler: I think a, a large problem that some engineers experience is that, uh, the ability to adopt AI is not equally distributed across engineers in the domains in which they work, like front end engineers and folks who work with TypeScript or whatnot, like they're able to use these tools in a much, much, much faster way than somebody working on a very deep embedded language embedded systems and stuff.

[00:10:59] Andrew Zigler: So does [00:11:00] this, does this type of approach allow those teams to actually start unlocking gains within their own code bases?

[00:11:06] Tim Dettmers: Yeah, so, so that was one of the main sort of points that we made. Also with the SERA paper, um, you can quickly specialize on your private data. is a very big advantage for open source. Like, if you, if you're OpenAI and Anthropic, you will not pay, uh, pick like random data from some company and put it in your training set. Um, but if you're the company, you can take an open weight model and now try to train on your data. uh, I think it's sort of common knowledge. Frontier models work well, but if you work with less and less and less common data, they work worse and worse and worse. And if you're very sort of specialized data in your company codebases but it can also be code basis related to documents. These setups are very common. Like a lot of companies, they say like, frontier models don't work for us. And so what you want is basically specialize, um, a coding agent based on the company data that you have, [00:12:00] then it can often be actually more effective than frontier sort of model.

[00:12:04] Tim Dettmers: We had the beginnings and sort of SERA was the beginning of that, and so what we developed is basically a procedure that makes as little assumptions of the new data that you have a lot of methods previously they required, for example, software tests to see that the synthetic data that you generate from a data set is correct.

[00:12:24] Tim Dettmers: Like you make some changes in the code, you know that the code changes are correct. We threw out this assumption, and so that means we can take any code base and uh, generate synthetic data. And we don't make the assumption that the generated synthetic data code is correct. but if you do it just in the right way, that data is actually as good as sort of data that you verify. And so, um, what that means is you can just take a code base, a private code base for your company, generate lots and lots of sign data, data quickly. 'cause you don't have this [00:13:00] assumption, you don't need to carefully verify, which is very expensive. And then you train on that data and you get very good performance. And so we show, um, we can get better performance in many close source models that have the same size. We need just $700 to basically train such a model. So if you're a company. It, it's pretty efficient to do this. And uh, right now these are beginnings. We wanna push it to very, very large models, like 350 billion or so.

[00:13:28] Tim Dettmers: I think then it becomes very interesting. Then you can actually exceed the asset of frontier performance by training on this private data. I think this is biggest advantage of open source and open weight models. Frontier Labs, labs cannot do this very well.

[00:13:42] Andrew Zigler: Yeah, I agree. This is an opportunity, I think, for the open source community, the open weights community to be able to kind of prove the efficacy of this research and this kind of approach because, you know, the, the way you just broke it down and, uh, kind of just wanna zoom in on it for a second. you know, you say that the, the synthetic data that's generated [00:14:00] on the code base or.

[00:14:01] Andrew Zigler: The thing to be trained on, it doesn't have to be correct in this case. And that's the assumption that we're throwing away from inference training and fine tuning from before, is that, uh, we wanted to verify, which is computationally expensive and it takes a long time then to assemble the data, but if you just assume that it doesn't have to be.

[00:14:21] Andrew Zigler: Correct. That's where soft verified I in SERA comes from. Then it allows you to go a lot faster while still achieving like comparable results. And this is actually something we see in like a across all knowledge working domains, I think of any kind of training data set that goes in any kind of specialized knowledge working tool that I've worked with that I've, I I've seen that's been constructed.

[00:14:44] Andrew Zigler: It's a, a lot of the conversations and pairings go in are not perfect and there's not always a. Perfect, a deterministic way to resolve a conversation or a query. So they exist already in this nebulous, kind of like, uh, gray area realm. So if you [00:15:00] embrace that same kind of idea for something that we see as more deterministic, it actually unlocks an ability to get progress from these tools.

[00:15:09] Tim Dettmers: That's right. Yeah. And it's just very cheap and don't have any further as assumptions. If you want to have correctness, you often need software tests, but, you can only generate as much data as you have software tests. And for some parts of the code base, you might not have good software tests. And now with this, you don't need, you throw away all of this.

[00:15:30] Tim Dettmers: You just generate data. And it's actually a very common finding, even reinforcement learning, where you just try to do the correct thing. people, a lot of people say like, Hey, if you don't. you use some of the incorrect data actually works, our model is better and how to think about it is the model not, not necessarily needs to learn what is correct, but it needs to learn how to map an instruction that is related to a code base, to basically the steps required to, um, translate this instruction [00:16:00] into, into an outcome. And the exact outcome is maybe as important as the entire process. And if, if the weaker model, the process is actually more important because the model has difficulty modeling the exact outcome. Um, but that's how you make quick progress. In the end, you might need reinforcement learning, but the beginning is just much more efficient to have the model basically learn this process, mapping instruction to, uh, the process of how doing a task, a coding task.

[00:16:29] Andrew Zigler: just to maybe package up the idea of SERA and or what folks our listeners can do with this kind of insight, what would be your call to action, your advice to an engineering leader, to somebody who is, uh, working with AI engineering teams that are, are using AI coding tools regularly on code bases and to produce, you know, output for the business.

[00:16:51] Andrew Zigler: There's been a lot of experimentation. And adoption around new tools, new workflows. If this was something that a team wanted to seriously explore, [00:17:00] how would you recommend they maybe even measure or understand the impact of that? Like do you have like a, uh, advice for them in that situation?

[00:17:08] Tim Dettmers: Yeah, so, so that's actually a big gap in academia that you have this gap of you can evaluate on data sets, but almost certainly big models have been trained on this dataset. So even if you have small open weight models, they're probably trained on this dataset. So in the company you really have private data, and if you evaluate on that, you get sort of this real gap. for that, you need to create the valuation benchmarks and the models that we have there where you can rapidly specialize. They're not quite there yet, but as a sort of engineering leader, um, what I would sort of recommend is pay attention. This will move very quickly. And I think very quickly we actually will have models that are better than the frontier models. 'cause they're specialized to your data. And so there will be this transition point and I think as an engineering leader, you should be aware when this transition point comes. 'cause then you [00:18:00] want to quickly switch, everything's very fast and you can move faster if you basically transition exactly at this transition point.

[00:18:08] Andrew Zigler: It's really great advice. It's about understanding how it's moving and making the switch at the most ec, like the most opportunistic, economic, most, like it makes the most sense to, to jump that bridge. And thanks to, you know, the foundational research that you and your team have done now, there's like a roadmap for teams to.

[00:18:26] Andrew Zigler: Be able to achieve that. And, um, you know, zooming out a little bit though, Tim, I, I wanna talk a bit more about your own, um, experiences as an academic and somebody who does research with, with AI and, and around these, uh, these types of tools applying, uh, AI to a software engineering.

[00:18:46] Andrew Zigler: And behind that, there's obviously a lot of. You know, agentic workflows that you've picked up, adopted, explored, especially probably in the last year. I'd love to learn more from like your recent writings on your blog, but also, uh, how you approach [00:19:00] this, um, as someone who's more academically and research minded.

[00:19:03] Tim Dettmers: Yeah. Yeah. I mean, I wrote this sort of blog post about, uh, using agents or be left behind a couple weeks ago. I feel since then it, it also changed like dramatically, like everything is just going so fast. I mean, for me, um, it almost feels now, now sort of, we at a point where productivity can be measured in tokens.

[00:19:24] Tim Dettmers: The more tokens you generate, the more productive you are, sort of. It's not true for all jobs, but, um, it's, it's approximate. So, um, and I use agents both for research, for software projects, then sort of as a professor for certain tasks. Um, and um, yeah, sort of week by. Um, you improve, you generate more tokens, and right now it's exponential.

[00:19:49] Tim Dettmers: My doubling time is around 10 days and so it just keeps growing and just like, um, crazy. So yeah. I think we at a point in time where, how to [00:20:00] use agents in doing that well, probably the most important skill that you can learn. Yeah.

[00:20:05] Andrew Zigler: And I think part of that skill as well is understanding when to build an automation, when to turn to an LLM to, or an agent to solve a regular recurring problem. To get those compounding gains. Like what is your math on that, to decide like how, or to approach like, uh, automating things that. Keep you busy.

[00:20:23] Tim Dettmers: Yeah. Yeah. It, it's actually sort of this double, uh, um, edge sword and, um, there are like two perspectives and I would sort of, um, so I'm, I'm, I'm German and I work in the automation industry in Germany. There they have like a very sort of straightforward, um, calculation of, basically what, what is your return on investment? And then there's, I would say the more sort of scrappy approach that has been quite common in China is just try things and see if they're better and learn things and sort of improve them over time. The German perspective that I also basically learned is you make this calculus of [00:21:00] you try to estimate how much more productive you're, so you say like, I wrote this task, uh, once a day, takes me 10 minutes. And, uh, then you just think about how long does it take me to automate? And then not only this automation step, but every day you probably are frustrated with your solution. You need to improve it. You need to think about it. If you add up all this time. Will it be more than basically the time that you save.

[00:21:25] Tim Dettmers: So if you need, instead of 10 minutes, nine minutes, you save one minute a day. Uh, but if you develop this thing for like 10 hours, it might not be. Very sort of, um, the payoffs come very late. But, um, that perspective can also be deceiving because, um, it doesn't account for learning rates. So when you learn to work with the task, then the next tasks that you do might, you might be able to automate more quickly. And then the next step more quickly than that. And so that's sort of this more Chinese perspective, is try some things, [00:22:00] learn some things. At some point you're so good at automating things that you can automate things very, very quickly and very, very efficiently. And so, um, so sort of rational calculus in terms of custom payoff. Can be helpful, good perspective, but one shouldn't be fooled sort of in the long term 'cause it can be also bad choice. I think these are sort of the trade offs that give you framework how to think about problems, uh, in terms of automation.

[00:22:28] Andrew Zigler: And if you're someone who's able to maybe achieve that and get that 10x 100x kind of output, that's compounding. Or perhaps you're a team leader, a manager, and you have somebody on your team who is unlocking that, but others aren't. How do you. How do you, um, maybe advise them to help distribute those gains, educate and share that ability to others, but then also use these compounding gains they're they're doing to compound onto others.

[00:22:56] Andrew Zigler: Do you have strategies for being able to unlock that?

[00:22:58] Tim Dettmers: Yeah, I [00:23:00] mean it's sort of quite interesting. Um, um, I work with my students and try to make them more productive with agents. Often we have discussions about what each other is doing and how we can benefit from each other. So very similar, what you described, sort of this problem of, um, doing better ourselves, but then also sharing that knowledge sharing infrastructure and so forth. And one learning is, For my staff personally is, um, if you build certain tools that you use, agents you consider build on top of them. At some point your productivity gain stack. But then if you share those tools, it might not quite work for others because they work in a different manner. And so I don't know how to solve it.

[00:23:39] Tim Dettmers: Sort of maybe you standardize certain tools and people sort of share and at some point merge on a particular way of doing things. Maybe that's also what we did. Engineering, like the workflow, how you have GitHub, repositories, pull request and whatnot. It's standardized basically. And so I think it will evolve. Um, but I think, [00:24:00] um, the most important thing is just talking to each other, learning from each other. And, just being sort of also humble in a sense that certain approaches that might not look great, if you stack them a little bit, you might get great payoffs. And so just being open-minded and try a lot of things and, um, just talk about, bring them together. I think for now that's the best strategy. Standardization might make sense, but probably not now.

[00:24:28] Andrew Zigler: Do you see strategies? Differences between people that are in the software engineering industry versus those that are in academia and what they choose to automate and how they tackle it. Like what are some of it being in that swivel chair like we talked about at the beginning? Like what is it, uh, what, what are the perspectives and what do you think each side learn from each other?

[00:24:48] Tim Dettmers: Yeah. Yeah, so, so it's actually quite interesting. The first things that are automated were sort of related to my, uh, job as a professor. And, um, these were actually the hardest tasks to [00:25:00] automate because. It's very different from engineering, like agents are really good at engineering. Um, but then the next step was for, for me, basically research which sort of more engineering heavy systems research.

[00:25:12] Tim Dettmers: And at the end it was just pure sort of engineering. And so if I look at those, um, you learn different skills if you do certain things. So for my job as a professor, it was like, for example, uh, creating proposals, doing literature research, um, sort of figuring out how pieces of researchers published, fits to certain projects for certain students, brainstorming ideas, um, then just administrative stuff like, I don't know, receipts, putting them together and that sort of thing. Um, and so. That was sort of the first sort of professor level stuff. Then the next sort of stuff is sort of, um, research, for example. And then the tools that are already built for literature research, they immediately become relevant to research. 'cause now it's just a couple extra [00:26:00] steps to figure out where are the gaps in the literature. The agent can figure out itself. The agent can sort of brainstorm ideas very quickly, and then you can go down certain ideas very quickly. You buy pipelines where they, um, automatically, uh, launch jobs on the cluster and can sort of just autonomously sort of do things. at the end of was sort of, I was moving through this engineering domain and that is the simplest domain. And in the beginning I had trouble launching jobs in parallel, sort of in academic settings, like how many tasks have been paralyzed. setting is very easy. Like you have an issue here bug fix here and here, and another feature independent, just jobs at the same time. academia for academic tasks was sort of more difficult. But what I later learned was you can also do it in all kinds of tasks, and it looks a little bit different. You don't do it entirely in parallel. How you can best think [00:27:00] about it, and I think this is a very useful perspective, is if you want to move most productive, what you should do is have an agent operate as long as long as possible autonomously and reduce the time that you spent giving instructions. And so if you do that, you can basically syn eventually launch sort of problems. Then come back basically to beginning and the agent will just finish and you give new instructions. So it's almost this per parallel, but of sequential sort of, and that works well in any scenario. So now I do a lot of parallel things basically in this way. And so if he's like, now these ideas are merging and everything becomes the same, it's not only engineering, it's, it becomes, it's now everywhere. And that's.

[00:27:47] Andrew Zigler: Yeah, exactly. It's like the ability to apply that technique applies for everything now, like the orchestrator pattern that you just described is something that engineers have a lot of. Luck and, and, and even like a lot of experience with a few [00:28:00] months in at this point, you know, really since like November top of the year is really when you started to see the orchestrator pattern start to take off these long running top level conversations where they use, you know, armies of sub agents underneath.

[00:28:12] Andrew Zigler: And really this, this top layer, like you said, the goal is to have that long conversation that's enriched and it's powerful. But it also, uh, you know, is preserving its own context. And I think that's like a specialized kind of challenge for any kind of team we all deal with, like information bloat. So how do we create these like curated, specialized environments?

[00:28:33] Andrew Zigler: I think that's also too part of what the opportunities of your research invite is, how can we reduce that complexity? How can we make this world more streamlined? Um, I, I also like too how you called out that like, you know, oh, engineering tasks with agents like easy, like researching and, and academic tasks like that is harder.

[00:28:54] Andrew Zigler: That is a higher level of, and I think that really calls out the, again, the unequal [00:29:00] adoption ability of AI across a lot of industries. A lot of domains, like a lot of us have seen the spider chart at this point of the domain expertise of Claude and the different areas. And there's some, some that are completely untouched that intuitively.

[00:29:13] Andrew Zigler: Make sense, but also too, a large part of the problem is that unlocking those gains can only happen by putting this type of process of working in the hands of domain experts. Because the domain experts are the ones that are gonna be able to understand this is the structure I need to be most maximally efficient.

[00:29:33] Andrew Zigler: And to an engineer that's, you know, we're engineers, we're gonna create the hooks and the skills and all of the piping to make these like, you know, the gas towns of the world. But to other industries and to academics, it's like they might employ those same techniques, but the forms that their, you know, agentic outputs, outputs take, could be completely different.

[00:29:55] Tim Dettmers: Um, yeah, I feel like there's this sort of complexity now. [00:30:00] That, um, certain how you can sort of think about it is, um, in any sort of, uh, job you have like different tasks in your time, you sort of split up instead of different tasks. And now with agents, certain tasks compress. Then certain other skills become sort of more important and it's sort of changing rapidly.

[00:30:19] Tim Dettmers: But that also means sort of if you're an engineer, it pays a lot if you know a little bit about different domains, then you can very quickly integrate in sort of other domain experts and work, work very effectively in sort of teams. And so, yeah, it feels like everything is very dynamic. Everything's sort of very quickly changing. but yeah, it's an exciting time.

[00:30:41] Andrew Zigler: Indeed. And you know, at the, at the top of this, we're talking about how, maybe teams, engineering teams should throw away assumptions of yesterday about, you know, what model should I use to code today? And, uh, that's, that's actually an assumption that, um, it, it, it falls into a bucket of many assumptions.

[00:30:58] Andrew Zigler: Recently, all of us [00:31:00] have been throwing away, like we, we, I recently had a CTO Whisper flow sahe on, on the show. And we talked about how, you know, voice the text is having a renaissance and was a technology that we all for a long time just completely dismissed as ineffective and not the right thing to meet the moment.

[00:31:16] Andrew Zigler: But now voice the text for many myself included are a key part of our velocity. And like you said, getting this kind of almost token measured output, uh, on a regular basis. So, you know, similarly to how voice to text is like, oh, maybe we should re-envision technologies of before that we considered restrained or not at the right fit.

[00:31:36] Andrew Zigler: You know, perhaps turning to creating these specialized. Fine tune models that are, you know, built on open source, open weight, uh, research and, and, and put them on your code bases. That actual private data that you have that no, you know, OpenAI or Anthropics ever been trained on, you know, it, that requires us to throw a lot of assumptions away.

[00:31:56] Andrew Zigler: Uh, I'm curious, like what are some other assumptions that maybe [00:32:00] come to mind to you that people should be throwing away as we continue to step into the year?

[00:32:04] Tim Dettmers: I mean, um, I think, uh, what could be quite related to that is like I wrote a blog post about, uh, why AGI will not happen. And

[00:32:13] Andrew Zigler: Yeah.

[00:32:14] Tim Dettmers: think there. Sort of, um, was a pretty sort of contrarian to sort of many ideas. A lot of people sort of surprised, um, that basically they made certain assumptions that they thought to be realistic. And I'm like more like, Nope, that's, that's not true. And I think one of the common assumptions is that, Compute will just get it better and better. That models just get better and better. uh, you might argue, okay, the problem is data that, um, the big problem is also just compute. I mean, if means, um, more tokens is more productivity, we might just run out of tokens if our hardware doesn't get better. And, um. That seems to be, uh, the case. So [00:33:00] I mean, I have quite some background in sort of low level programming of Cuda and, um, working with GPUs I do machine learning systems research. So, um, I dive like deep into the details to figure out. can I get more efficiency? what we are seeing is that efficiency runs out in many domains.

[00:33:20] Tim Dettmers: The more you try to make something efficient, the more you succeed, the more difficult it's to make the next success or the next improvement. And so we see that on the GPU level that's disappearing. You cannot make any gains anymore. That is, doesn't mean that gains will parti, uh, disappear completely. the game shifted to optimizing racks, so multiple computers at the same time that are networked. And so, um, there's still a lot of innovation happening, but innovation is quickly filling up, so to speak. so that means the landscape will change. You might come a world [00:34:00] pro productivity or how effective a company can sort of operate will not necessarily be measured by, uh, basically how many tokens, but more like a metric of, um, you wanna get a certain quality per token, but then also what is important, how much, what the to token is, or what the cost per token is because the tokens will be limited. can say that. And so with that, um, you need to consider the cost or how many tokens you can generate. That's dependent either on energy, energy limited or cost. If tokens become really expensive because everybody wants them. And so yeah, if you take that into account, efficiency gets more important. But it's important to understand where efficiency still can be gained. And on the GP level it's very exhaustive. So some important assumption that will, I think, broadly shape the field, um, and trickle down to a lot of sort of other set of ideas.

[00:34:59] Andrew Zigler: [00:35:00] I really like, uh, your framing on about how tokens, we don't really know which direction they will go in terms of their cost over time, you know? Uh, I'm happy to hear you say that, Tim, because that's actually something that I've echoed before here on the show, actually to other guests and even my co-host Ben, and.

[00:35:16] Andrew Zigler: You know, I actually, I have found myself in the minority on that opinion. A lot of folks were telling me that, oh, the cost for tokens are just only gonna go down. The competitive nature of the market is just gonna drive them to the floor as the cost for creating those tokens goes down. And so to hear you, you know, call out the constraint on compute, being a driver for why that cost could actually go up, uh, you know, I think it's really, really insightful.

[00:35:38] Andrew Zigler: I think that's something more to explore. Like, do you, uh, what are, what are your thoughts on like, uh, the diverging opinions on like, which. You know, token costs will go, and maybe even then what engineering leaders could do to hedge their bets for that reality, like we've talked about already.

[00:35:52] Tim Dettmers: So I think there's just simple supply and demand, but then there's also a question, where are the bottlenecks? Like, [00:36:00] um, six to nine months ago, people were like, how is the bottleneck? Then people realized, oh wait, memory is a bottleneck. And people are allowed like, oh wait, wait. Silicon is a bottleneck. And so what does

[00:36:14] Andrew Zigler: Hmm.

[00:36:14] Tim Dettmers: mean? Um, with all of these things are moving, but um, something will be the bottleneck and that will basically, um, determine how much resources we have, how much supply there is, and then it's just a question how valuable we'll be tokens. Maybe people are more productive, in the end it's also if your company is a calculus, like let's say I spent a million dollar on on tokens, how much more get outta this?

[00:36:42] Tim Dettmers: And if the answer is a less than a million, you'll not do it. Um, but if there's some approach where basically return on investment would be great, it's a no brainer. And so, will be difficult to say where it goes. I think I can see that it will go a little bit down still, the token prices. [00:37:00] But then I could see that they're growing up again, because as firms and people basically, um, adopt these more and more. I mean, I see I have a certain doubling time. Maybe not everyone has it, but it will start happening. And if that starts happening, then people run out of tokens. And I heard it already multiple times. They said like, oh, I'm, I'm so jealous of you. You work at a company where you have infinite tokens. So, um, yeah, I think. I think the reality lies somewhere in between, but I could imagine token prices go up. not right now, but very soon.

[00:37:36] Andrew Zigler: Yeah, you're really speaking to the reality of the moment. Very recently on a new segment here on the show, we talked about the idea of, uh, inference as a compensation package part, like the idea that, oh, you're, yeah. Having access to infinite tokens or having this kind of like, uh, outside of work token budget was a.

[00:37:55] Andrew Zigler: A benefit or like something that you would advertise as part of a job listing, [00:38:00] you know, which is actually kind of, kind of, kind of wild to consider, especially when, you know, we're at a time right now when you know things like, you know, capital expenditures way up. We're building the highways of what's gonna be the world of tomorrow.

[00:38:11] Andrew Zigler: We're all betting on what this agentic future's gonna look like, and then also the infrastructure we need to get there. So you, it really puts all of us into this like breakneck pace. But, uh, it sounds like from what you are describing in the sense the ability to unlock gains from this will hit a ceiling on GPUs because of, you know, the, the, there, there won't be an ability to get more gains out of what we have available.

[00:38:36] Andrew Zigler: Do you see a world where that would be then, uh, a plateau of agentic capability or do you think that that perhaps then becomes a new pla. Platform that we find some new height that unlocks the ability to do that. Like, if it were to kind of, uh, even out, how long do you think it would take us as people?

[00:38:55] Andrew Zigler: I'm curious, like you as a researcher, you've thought about this, like, how long will it take us as a society to [00:39:00] really process the, the, the jagged overhang, the jagged frontier of all the capability already available and embedded into our world? Like, you know, maybe it hits a point where, it doesn't even matter if we've hit a ceiling.

[00:39:12] Andrew Zigler: We got way too much work to do.

[00:39:14] Tim Dettmers: yeah. Yeah. So I mean, I think what is quite instructive is the, uh, the adoption of computers and, um, if you look at computers, a lot of companies got really excited and the invested sort of in computers and investment went up, but productivity actually went down and it took quite some time until productivity went up. And that is like the, the para paradox of productivity with computers and with AI is clearly not, we are more productive, but, um, in computers it's not fully understood. But the story was, um, if you have a computer, it's not. Immediately it is sort of useful. You need to combine it with other digital tools. we might be there right now that we see a lot of growth, but the growth is really unlocked if we combine a lot of [00:40:00] several AI tools together. so I think that is sort of the trajectory where we're headed what it means sort of overall, Sort of not quite clear, but I think that's, uh, the broad direction. And, uh, with that there's then demand for how, how do we, we deal with this situation. As I mentioned, hardware will be still a bottleneck. That's the question. Can we do better at the other things that are sort of possible? If you look at the underlying fundamentals, it's basically, um, sort of communication and computation.

[00:40:34] Tim Dettmers: That's what you do. Um, you load the weights from the memory, you do compute sort of, um, matrix multiplication on top of 'em, and this is very efficient and, um, there are not many sort of more efficient ways to do this than we do it now. There might be brains, but the problem with brains is you can't copy them.

[00:40:52] Tim Dettmers: Like if you build a product on a biological brain, you cannot copy that brain and make it a reliable product. just doesn't work. So [00:41:00] digital computers, we need them because we need something that once you create it, you can copy it, put it somewhere else. And if you use digital computers, really at the end, like both in terms of physics, but then also efficiencies, like geometric problem almost. You go a certain distance in memory, you do a certain amount of computation and a certain client a certain space. That's what you need to do. And we are there. And so, maybe there are some new levels. We need to find new levels that give us an advantage. Otherwise, if we have sort of exponential growth probably will come soon. Uh, will be difficult.

[00:41:35] Andrew Zigler: Yeah. You know, I, I, I, you know, you're definitely the only person ever on the show to talk about the idea of putting compute on a, a biological brain. I love how you are able to, you know, since you're unfettered, I guess. By the regular constraints of, you know, what myself and a lot of our listeners exist within, of, of being a software engineer within like, you know, being a very productivity forward kind of user of ai, but actually challenging and throwing away all of those [00:42:00] other assumptions about what that means for society as well.

[00:42:03] Andrew Zigler: And you know, I think we've had like a really interesting, uh, journey in this conversation because it's unlocked, I think for our listeners, uh, an an assumption that they maybe had before that they should throw away about approaching open, source, open weight models on their private code bases, and actually getting performances from them for maybe the first time.

[00:42:22] Andrew Zigler: Thanks to your research and what your team has, has actually put out there. And then as well, we've talked about how the, you know, the art of automation and how you build those skills and what that looks like both in a software engineering domain, but also in an academic domain. You know, how we can cross link those.

[00:42:38] Andrew Zigler: So it's really kind of a, it's been great to get your research minded perspective on our show. We have a lot of industry leaders, so it's really great to, to have somebody who's, you know, writing the, the white papers and the research that's actually powering the technology we're all using every day. But before we wrap up here, Tim, uh.

[00:42:54] Andrew Zigler: Where would you recommend people go to check out the work you're doing at Ai2 and your latest experiments?

[00:42:59] Tim Dettmers: [00:43:00] Yeah, so, so, um, Ai2 the main website. We have like, uh, blog posts and everything that condenses the, uh, research that we have sort of, and. Pieces that are very easy to understand. And of course I link to our papers. So, um, please also read those. Um, yeah, and um, you can go from my websites. I have my blog posts on Twitter and so forth. yeah, please check it out. I'm always sort of letting people know or my latest thoughts and latest research. So yeah.

[00:43:28] Andrew Zigler: Yeah, we'll definitely share those, uh, notes in, uh, those links in the show notes, and especially to your blog as well. Some of those articles, which we've touched on today in our conversation, we'll be sure to link them. So folks, if, if anything today, you know, caught your attention, I, I really implore you to dig deeper on Tim and his team's research that we've covered here before on the show in our news segment now. We've had a really great opportunity to flesh it out further. You know, I think there's a lot to unpack from this conversation, and I really invite you to come find Tim and I on LinkedIn and, and, and otherwise wherever you're listening or reading, [00:44:00] uh, the information here because, you know, we would love to continue this conversation just beyond our discussion here today.

[00:44:06] Andrew Zigler: but thanks again for tuning in and we'll see you next time while Dev Interrupted. And Tim, thanks again for joining me.

[00:44:13] Tim Dettmers: Yeah thank you so much for having me here.

Your next listen