Will AI Finally Make TDD Practical?

“ Human nature is to solve a problem and then move on to the next dopamine hit, right? There's no dopamine hits to be had in writing tests.” - Animesh Mishra, DiffBlue

The promise of Test Driven Development (or TDD) remains unfulfilled. Like many other forms of aspirational development, the practice has fallen victim to countless buzzword cycles. What if the answer is already in our toolbox?

This week, host Andrew Zigler sits down with Animesh Mishra, Senior Solutions Engineer at Diffblue, to unpack the gap between TDD's theoretical appeal and its practical challenges.

Animesh draws from his extensive experience to explain how deterministic AI can address the key challenges of building trust in AI for testing. These aren’t LLMs of today, but foundational machine learning models that can evaluate all possible branches of a piece of code to write test coverage for it. Imagine writing two years worth of tests for a legacy codebase… in two hours… with no errors!

If you enjoyed this conversation about the gaps between theory and execution in engineering culture, be sure to check out last week's chat with David Mytton about shift left adoption by engineering teams.

Show Notes

Check out:

Follow the hosts:

Follow Ben
Follow Andrew

Follow today's guest(s):

Transcript

Andrew Zigler: 0:07

So welcome to Dev Interrupted. I'm your host, Andrew Zigler,

Ben Lloyd Pearson: 0:12

And I am your host, Ben Lloyd Pearson.

Andrew Zigler: 0:14

in today's news, we're talking about a few things. The sandwich generation in tech, Google is striking back against ad blockers and an AI powered deepfake coding interview applicant that almost got hired and the entire internet is talking about it. Ben, what do you want to talk about first?

Ben Lloyd Pearson: 0:31

Yeah, I've seen that last one a ton, but maybe, maybe we'll save that for the end. And since I'm feeling like half a sandwich, maybe let's start with the sandwich generation.

Andrew Zigler: 0:41

The sandwich generation in tech is something that maybe you're very already familiar with, but you didn't know the term for it. It's the folks that maybe you are one of them, who every day when you go to your job, you have people younger than you that you take care of and people older than you. Than you take care of. Oftentimes you're in a remote setting. Your time is split between being a caretaker for your kids, but increasingly people in the workforce are also taking care of their parents. So you end up in a situation where there's a large identity of folks who are dual caretakers. in fact, there's 11 million of them in the US as of last year. Makes up millennials and Gen Gen Z are about a. Third of them, most of them work full-time or part-time predominantly. but overwhelmingly 90% of them or more in this article talked about how they have to make lifestyle or financial sacrifices and, you know, they have to compromise on their own financial security and their future. It's not a temporary issue. That goes away. And anyone who's a caretaker is very familiar with this. When someone depends on you, it's there. and you have to always keep it, uh, front of mind for yourself. And these people don't just work in tech, of course. they work all across different fields. But they also have to wear a bunch of different hats at home when they're solving problems for the older and younger generations. Maybe they're the power of attorney for someone less capable of making decisions for themselves. Maybe they're tech support for their parents and their kids. Because it turns out that millennials and Gen Zs are the only generation that really learn how to become really good tech support. Everyone else older than us and younger than us, you know, they just Google it or they don't know how to do it at all. imagine having to do all of this context switching, and wearing all of these different roles while still doing their job. So when I was reading this article about the sandwich generation, you know, I saw a lot of myself in it. I'm not fully in the sandwich. I'm more like a pita, you know, I only have one piece of bread, I guess. But I definitely feel the pressures of having to be responsible for them.

Ben Lloyd Pearson: 2:42

Yeah, I've got kids, so I'm definitely feeling one half of the sandwich very strongly. And you know, I guess the real question is what kind of sandwich, you know, because if it's like a panini that's been pressed, like, man, that's a, that's a pretty rough life, you know

Andrew Zigler: 2:56

Yeah, our listeners can probably tell We have not eaten lunch yet when this topic came across. but, you know, you two, Ben, you know, you, you have kids and you work remotely, and I'm sure that that takes a lot of your time.

Ben Lloyd Pearson: 3:06

I actually think it points out how much remote work can help mitigate the effects of this, because, my mom worked in tech and for a while she was also getting her master's degree. So I remember her working really late nights and not getting to see her all the time. And I often think about how, you know, as a remote worker, the flexibility that brings me and the fact that I see I, I'm way more involved in my kids' everyday life just because I'm around a lot more. I don't have a long commute and I don't have to sit in an office all day. it's definitely an interesting trend, or an interesting phenomenon. And I'm kind of asking myself is this an HR problem? Is it an engineering leadership problem? Are we simply not hiring enough developers for some companies? Or are we pressuring developers like to do too much? You know, there's always this like endless drive for productivity, but when do we actually start to see a drive for more flexibility? the return to office movement is definitely not a positive trend, this.

Andrew Zigler: 4:06

No, not, not for the sandwich generation. It puts them in that hard position, like what you described. It makes me think of even myself, you know, growing up I was a latchkey kid and both my parents worked and, you know, they weren't able to work at home. Remote work wasn't a reality when I was growing up and that wasn't an accessible life for them. and so it really changes how, you know, you become your own independent. Person. but also just having that person available for you to support you is really, really great and helpful. So the fact that a generation now, like my generation is able to work remotely and, and help raise their kids and be there for them when they come home from school, just those small things I know that can make a lot of difference, but at the same time, it puts a lot of pressure on them. Like what you said at work, they're strained, between leadership. Decisions between project deadlines, between things that have to get out the door and everything Right now in every industry is crunched and accelerated. There doesn't really seem to be much relief, and so that's why it's always important to stop and highlight these human moments and how we're all working together and having to support each other. I think it really calls out that, you know, the flexibility of remote work is really. Key for this generation to not only survive, but thrive and be able to set up a future generation, for that same kind of success. otherwise you're gonna end up in this scenario where it's like, kind of like a brain drain situation if you can't accommodate these very talented but obviously. Responsible. And in some scenarios like strapped, leaders and professionals that are just, in between life's situations, if we can't account for them, then they can't be in our industries, they won't be represented. And then that means that the products that we make, the businesses that we. Build are less understanding of them and are less able to serve their needs and the situation becomes worse. And that's always why it's important to make sure everyone has a seat at the table. that way that we can resolve those problems as a society.

Ben Lloyd Pearson: 6:02

so my favorite part of this article is the very end where they brought up people who, were responding to this by building technology to help them care, take or take care of their, their elders. You know, so I'm asking myself like how long until the pressure builds to a point where I. Developers just start, start le leaving their jobs and using AI to build the next generation of elder tech, you know?

Andrew Zigler: 6:24

Well, the next story is about privacy. A little bit of a different turn, but everyone's heard something about Google Chrome and privacy and how different browsers that we use, the surf, the internet, all handle our personal data very differently. Well, there's been a, a recent shakeup, and it seems like Google is getting closer and closer to making Chrome Something more like. AOL of the modern day. we're talking about getting rid of ad blockers, getting in the way of how users want to use and configure their browsers and ultimately protect their privacy online.

Ben Lloyd Pearson: 6:58

AOL 2.0. I'm already hearing the dial up tones in my head. My goodness. this was sparked by a, a conversation we saw on Hacker News, a very heated conversation about U Block origin being removed from the Chrome Store. you know, and the real story here isn't anything to do with you block, because we all kind of know that you block, it's a good app. People really like using it. It, it's not malicious, it's not trying to harm you in any way. but it's being removed because of these new restrictions that. Chrome is bringing, into their browser, but also sort of like using their, their position within the market to force standards to go in a direction that they want. there's a lot of ways to get around this stuff. like use different browser for example. I personally am a massive fan of Pi-hole, which if you've never used it is DNS service that you can run yourself and will. And instead of serving you up ads, we'll just give you blank. Nothing but convince the websites that you actually did see the ad. and it removes ads actually on your entire network, which is pretty amazing. the real story here I think is that, Chrome has this long history, in Google, through Chrome has this long history of pushing standards that, you know, maybe other browsers don't really want to adopt those standards, or maybe users, certain users don't feel comfortable with them. and we're just kind of like going further down this path. And of course, always in the background is things like the antitrust, rulings that are coming down on Google that. forcing them actually to re, to make Chrome its own company. So I really feel like we're, we're in the middle of this story right now. Like we haven't, we haven't seen the end yet.

Andrew Zigler: 8:35

I feel like we've been in the middle of the story for a long time. You know, Chrome has made lots of changes over the years, against and for privacy that, you know, its users, its consumers have not been happy about. I think you make a very, salient point about how the antitrust plays into this because ultimately, you have to remember at the end of the day, Google is selling you ads, and that's where Google gets most of its income from. And so they have an incentive to make sure that they serve you ads. And as the creator of Google Chrome, you know, one of the largest and. Most adopted browsers, it puts them in a right position to not only serve you the ads, but to serve you the place where you're looking at those ads. That amount of control over our browsing situation is a lot for many people to kind of tolerate. And I can certainly see people going their own ways using different browsers. Like what you said, maybe even using fanciful acts like this Pi-hole situation you're describing. DNS engineering Aside, I do think that it highlights the importance of understanding how you're browsing the web, the websites that you're going to and what they know about you. but really now that we've been talking about AOL, I just really want us to have like AOL keyword Dev Interrupted. How cool would that be to go back to the AOL keyword era.

Ben Lloyd Pearson: 9:45

I'm not on board with that, but I, you know, I do kind of wonder like, are we nostalgic for like a web that like never actually existed? Like, you know, analytics is something that's been sort of built into the web from, from the very early days. And at this point, it's almost like a fundamental component of the very fabric of the internet, nothing can really optimize, be optimized without data and analytics and that requires some degree of tracking that, you know, I think has really always existed. And I certainly do miss the, uh, profound lack of advertisements everywhere. so Andrew, let's talk about these AI fakers that are trying to enter your organization.

Andrew Zigler: 10:24

So if you haven't heard about this article, maybe you've been under a rock, maybe you haven't been on LinkedIn, but this has been blowing up the tech industry in the last week, week and a half

Ben Lloyd Pearson: 10:33

I've seen at least 20 posts on LinkedIn about it.

Andrew Zigler: 10:36

Me too. I've seen so many. And so we had to talk about it here because if you've been listening, you know, a few weeks ago, we talked about how, AI is disrupting the interview process in tech and about how it changes what we need to have in mind and what we need to be, considering when we bring in candidates, this goes the whole other direction about the dangers that AI can. pose within your hiring process. it's about an, this whole scandal, just to back up and give you some context, it happened for a company. it's a Polish company called Vioc. And they were hiring, for engineering roles. and they were doing this remotely. This is a remote work position. And so they were hiring candidates and as they were evaluating. One of them, you know, they passed the initial interviews. The resume hit the marks for what they needed. they even sailed through the technical interview. but then at the final stages there were some questions about their background, about some gaps on their resume, and ultimately an offer wasn't extended. a few weeks later, or during that same hiring process. Another candidate came into the queue, who managed to pass all of those same checks, talked to those same people, but was posing as someone else completely. So both of these were actors. These AI powered actors were actually the same person applying for this role, or at least they're, we assume they are the same person applying for this role. And they were using AI technology to basically fake their way through. All of the hiring process, even getting all the way, to the end. And the only reason that they were debunked is from a now viral video, which is probably the reason, you know, this story, of the, of the final interviewer, asking the candidate to put their hand in front of their face in the interview. And as we all know, when you have photo filters, you can't do this because your photo filters are gonna pop on and off the candidate. Refused. and this is because it would break his filter and imagine that he made it all the way through that process, not just once, but twice and almost got the offer. And the only reason he didn't is because he used the same, slightly same voice. He sounded slightly similar. To that first candidate and this hiring team, they were on it right. really props to them for being in charge and on top of their hiring process. The whole time I've read about their, their post-mortem on it, I think they did everything right. I think this is something that could happen to any company, and it's a really big wake up call about how hiring is changing.

Ben Lloyd Pearson: 12:58

Yeah, I mean, we just covered this recently in a, in another episode about how AI is just fundamentally shifting the, the interview process for software developers. And this is one of the darker ways that's happening. But, you know, it makes me think like, what is the way that you validate that somebody is real? I've been talking to some people I know that like, uh, in my personal life, it's like there's gonna be a day where AI impersonates me. So we need a code word that only we know that will validate that I am who I am. When you hear me talking to you, And we almost need something for the interview process as well. It's like, in my next interview, I might have to request that, like the candidate turn off their background filters, put their left hand and their right foot in the frame of the camera, and then spin around in their chair three times or something just to validate that they're actually a human being, you know? these impersonations are only gonna get better. So we've gotta continue to evolve our way of detecting and responding to them.

Andrew Zigler: 13:56

We have to sharpen our evaluative skills for candidates, but we can't let the fear. Of these ai deepfakes harm our hiring process for real humans. Either we can't make it oppressively difficult for people to apply for and to get roles, or to prove that they're real. we need to come up with intuitive and simple like ways for people to prove that they're real, virtually without having to, add a bunch of byzantine process. You know, I really. Commend the engineer who interviewed this person who had the idea in the moment to put their hand in front of their face, to try to, get the filter to fall off. I think that's one step in the right direction. I think there might be easier answers out there too that, maybe brighter mind in hr, in tech, is already thinking about. And if that's you, if you're an, if you are hiring and. You are kind of experiencing this and you already have some interesting stories, from the AI world for the jobs you're hiring. We'd love to hear about them.

Ben Lloyd Pearson: 14:53

yeah. I would love to hear what our audience thinks about this. if you've encountered this, head over to our substack, head over to LinkedIn, wherever you prefer, and, and leave us a comment that describes what you're seeing. You know, are you experiencing this and have you had to address this? So, Andrew, let's talk about our guest today. Who do we have?

Andrew Zigler: 15:09

I am very excited for today's guest. After the break, we're bringing Animesh Mishra on the pod. He's a senior Solutions engineer at Diffblue and Diffblue is an enterprise grade solution for automating test suite generation. They make it really easy to make a massive amount of tests. Scale. And when you stick around for this discussion, you're gonna learn about the realities of AI and how it can be used to do this. and how this is finally enabling test driven development for engineering organizations at scale. You really don't wanna miss this one. Are you struggling to explain developer experience to non-technical leadership? Join LinearB's upcoming workshop and learn how to translate Dev X into language the business cares about. We'll show you how to present data on developer productivity, AI performance, and engineering health in ways that drive alignment and investment. Plus, you'll get an early access to our CTO Board deck template, making it easy to connect engineering metrics to outcomes like faster time to market and cost savings. The link to sign up is in the show notes. We hope to see there. Today we're talking about one of the biggest paradoxes in software engineering. Developers love the idea of test driven development, or TDD, but in practice, almost no one actually does it. And here's what we're going to unpack. AI finally make TDD practical? Developers that struggle to trust AI in the last mile, such as for testing, how do we build that trust for them? And AI isn't just LLMs. So if you're a software org that's investing in AI right now, how can you expand your approach beyond mainstream AI solutions? Animesh, welcome to the show.

Animesh Mishra: 16:57

Thank you Andrew. I'm a software sales engineer at DiffBlue, and I'm looking forward to the conversation.

Andrew Zigler: 17:04

Yes, we're really excited to have your expertise here. You have a lot of exposure to how people use testing out in the wilds, and a lot of knowledge about how teams can be unpacking this kind of stuff today. But starting with the first topic at hand about test driven development, and it's one of those things that, you know, people say they do, but, you know, let's be real, it doesn't really happen all that often. Maybe we could start by getting the definition of what is TDD? What is the goal of test driven development?

Animesh Mishra: 17:31

So the goal of test driven development, which kind of relies on the name, is to make sure that no piece of software goes out the door untested. Now there are, that has always been the case with software engineering. but historically what people found out was that the, the fun part of software engineering is in coming up with the code that solves the problem. And all software engineers are good problem solvers. But once you've solved the problem, writing all of the testing code and all the harnessing to make sure that it is delivering every single piece of functionality required of it is not that interesting. And so, historically, people saw that, the model of you write your code, you see how it works, then you write the test that validates all of the assumptions made about the code, and then all of the, functional assumptions parse, then you can get, you can say that this code is working perfectly and no bug or fault has, uh, sneaked in. That didn't work because, like I said, the human nature is to solve a problem and then move on to the next dopamine hit, right? So there's no dopamine hits to be had in writing tests. So, test driven development, emerged around the, uh, with the emergence of extreme programming. So there's a new, ways of writing software, which not only improved, the productivity of the team's writing software, but also had lots of techniques on how to do existing things better. And one of the focus was on doing testing. So they figured that a lot of the time software is badly written or has bugs because of the the functional requirements are poorly understood by the developers. So they assume that it's, this is how it should work. Then they write the code, then they write some tests and then off they go. And because of that poor understanding of requirements, you don't get the right kind of, uh, code, to go with that. So that's the first challenge, that was spotted. So to address that, the idea of test driven development was created. So the idea being instead of writing the code first, you would write unit tests that define the exact spec of the code you would be writing to satisfy that requirement. And in the process of defining that test script, your assumptions will be challenged about how this piece of code or method should work. You will refine those with your product team. And once you've understood right down to the detail of a unit test, how this thing should work, Then you move on to writing code because by that time, having thought through all of the eventualities and all of the logic that this code needs to do, you would have a good design in your head anyways. So then you go from, something on a piece of paper to something running, working and deployed much faster. It does work. So it's not that this is all, you know, uh,

Andrew Zigler: 20:30

it does work. It does work. And it's maybe just that, like, to your point, it takes the unfun part of it, the part that's not the dopamine hit, and it front loads it onto what is now what you do every day. When you're building things, you want to go start building, you don't want to start by validating what the thing you build will do. So it kind of becomes a showstopper, in a way. Not that it's not effective, as much as it is just, it's unattractive to work that way.

Animesh Mishra: 21:00

That's correct. Yeah. When the the problem is fun, then it's actually really cool and challenging to figure out, okay, how many ways this needs to, can go wrong or how many different permutations do I need to account for and enable? And then that's pretty cool. Unfortunately, a lot of software is quite mundane. And then it just becomes dull work because then you're just writing tests for the sake so that you can move on to writing the code for it. uh, that's where the rubber meets the road, where you get teams which aren't fully committed or motivated to do TDD. They start. Cutting corners. So they say, okay, we'll do TDD on the business logic bit. Well, we'll leave the, the others. We'll kind of we'll see as we go.

Andrew Zigler: 21:46

when we talk about that work and it's like not, you know, it's not sexy, it's not fun to do, how does AI now fit into that conversation? Because we've talked traditionally about AI being, you know, it fills in and does that mundane work.

Animesh Mishra: 21:58

takes me back to the, the first thing I said, the purpose of TDD was to make sure that no piece of code goes untested because what you're trying to reduce, production incident and what you're trying to improve is your code quality and your delivery rate. So the fewer faults there are in the code that you're pushing out, the faster your entire team works. if that is the goal, the path you take to reach that goal can be many, right? So TDD was invented, I think, 25 years ago, before my time for sure, because I always worked in a TDD environment. and that was the time when the only way to write a unit test was to get a developer to write a unit test. There was no other options, which means there was only a singular path from, I have something and I want something tested. now we have options. Now we have AI tools, which can do software testing. And I mean, it's all pretty new. The whole field is new. So we're still figuring out what works and what doesn't and where to use which specific tool. But now more parts that have opened up. I mean, the reason we're having the discussion today as well, and I'm going to be in California next week talking about this, during developer week too, is the fact that now that we have more paths, it will be foolish not to try to traverse them. So let's at least walk down the path, see what we find, right? So, I understand lots of people swear by TDD because it has delivered and if you can stick to it, it does deliver. The challenge is, it takes a lot of time, development time that you could utilize working on better features, better design. Because the thing with software engineering, there's always a lot more to do than the developers have time to do. So with AI, what you can do is take away this mundane work, automate it away, give the developer back their 20 30 percent of the time they spent doing TDD at the moment, and let them focus on other more, higher value targets.

Andrew Zigler: 23:56

And I think that's the core takeaway here is that there's an opportunity to free up the time that you might spend doing other things or you know, writing unit tests and spend them in time doing more productive and impactful things. It frees up developer time to, focus on those higher level problems that you can't automate.

Animesh Mishra: 24:15

Definitely. And the other benefit, which is kind of unsaid that people are only now starting to see, and you get that with certain tools, but not others, is the human mind is brilliant, but one thing it's not is consistent. this is why you always have these arguments in, development teams about standardization. This is how we write unit tests. You'll get linters involved, so on and so forth. One of the ways you can, so the benefit you can, might get with an AI tool, which is deterministic, which always produces the same kind of code and same kind of style, is that You can standardize the style of tests you write. So regardless of whether it's my code or your code or someone else's code, the testing, it follows a standard pattern that you have approved and adopted. And that is, that takes away a lot of the cognitive load that you're put under as a developer. The first time you see a piece of code you've never seen, then you go to the test script to see, okay, what it is doing. And every test script is written in a different way. All of those problems go away.

Andrew Zigler: 25:18

And when we talk about the way that developers can use this tool, you know, part of it is that consistency, so they know what to expect. And I think consistency is a core part of trust. There's a recent Stack Overflow developer survey that found that only 43 percent of developers right now at work feel that AI is accurate. And when we talk about a process, like the last mile, which is testing, you know, doing coverage on an application, or in some cases, the first mile, if you're doing that up front, before you develop anything, how can we trust or how can we build that trust in developers for using a tool like this? Right.

Animesh Mishra: 25:56

It has to come from being transparent about what the technology can and can't do. And how it works, what we're seeing currently, there's a Cambrian explosion of AI tools. Everyone's making one, right? Everyone and their mother and grandmother. but a lot of the time they're just, uh, what sometimes pejoratively called LLM wrappers, right? I think they add a lot more value than just LLM wrappers, but that's what people call them. what's important from a developer's perspective. So I work with a lot of teams which are evaluating different AI tools for different problems. I don't think the software development lifecycle has a lot of tooling currently, and there's a good reason for that. It is very hard to use one tool that does it all. I don't think we're ever going to get to a point in software engineering where you have one tool that's taking care of every single problem. And the reason for that is there are multiple people involved in the chain and they have different problems to solve. If we just focus on the testing part of it, even in testing, often you have Two teams involved in many companies. You have the developer writing the code, and there's a testing team that's completely separate, often working off a Cucumber script or just, you know, a JIRA ticket. The developer will just give them a code and they'll test it.

Andrew Zigler: 27:11

right.

Animesh Mishra: 27:11

because of these reasons, I don't think people should try to go and find one tool that does everything. and luckily we haven't seen a company yet that's trying to do that. So then we come to the trust part. So you can only trust something once you know how it works. And whether it delivers, right? So you trust your toaster not to burn your bread every morning because you did some trial and error when you first bought it. You set it to three, then it burnt it. So then you set it to two. So yeah, that's, that's about what I want. So there will be, every new technology is going to be some trial and error. People need to be willing to experiment and see what works for them. Because I, uh, the company I work for, DiffBlue, for example, we have a tool that works extremely well for certain companies, but are completely useless to other teams in, solving other types of problems. So there's no, unfortunately, very roundabout way of saying there's no standard one answer to how to build trust. It's going to have to be built slowly with trial and error, people finding out which tools and which technologies solve a given problem better. Within testing, I think. There is a room for LLMs and other technologies as well. I also think in testing what you want is less creativity and more determinism. Because you don't want your test to work differently on a Tuesday than they do on a Friday.

Andrew Zigler: 28:36

Right.

Animesh Mishra: 28:37

that is something where testing is very different from when you're actually writing code, trying to, you know, deliver a ticket.

Andrew Zigler: 28:46

As part of that too, there's a connection, you know, between how much code coverage you might get and with how much like quality your software can ultimately reach. But is that an actual connection? Like if you have the If you have an AI that's able to provide that full test coverage, what do you gain that you maybe don't if you had a human that was making a non standard approach to doing that same task?

Animesh Mishra: 29:14

Well, the first thing you get is predictability, right? So, once you've seen how a tool works, particularly AIs which are more deterministic, So once you've seen it working on one application and the kind of test suite it's writing, its style, how it creates its assertions, the strategies it uses to test a given piece of business logic. If you like that, and that too is, More deterministic than probabilistic, then you can trust it to work elsewhere as well. And this is where we see a lot of POCs happening currently across, the technology industry, and that's what they're trying to find out. Which part of a POC result can we deem to be. repeatable, and which was a fluke.

Andrew Zigler: 29:59

I see. So that's how engineering leaders are really evaluating that. They're looking for that repeatability. And what, is there anything that they measure in particular when they, when they're evaluating a tool like this to see if it's

Animesh Mishra: 30:09

Uh, so I can speak, uh, I can speak for testing, because I'm very close to that currently. so in testing, we've long used, unit test line coverage as a measure of, how good a test suite is and how well your code is tested. I think we need to get a bit beyond that, particularly in an AI world. I'll give you a very good example for that. So when developers write tests, they tend to write tests about eight to 10 tests at a time, they'll create a pull request, you can review it. Somebody else can come and review it. AI has the capability to go into an application and write, 50,000 tests in one go. Now, nobody's sitting through and validating those 50,000 tests, right? We do POCs with companies who have these large legacy applications have, which haven't been touched in say 10 years, because nobody knows how they work. They don't want to break it. They do want to modernize them. And to be able to modernize them, they need a test suite in place so they can validate what the new is the same as the old. for these kinds of problems, AI is excellent because what it can do is say, okay, fine. Give me your 10 million lines of code and I will go and write all the tests you need. So far, so good. It's done two years worth of job in two hours. But how do I know that it's good? That's the question engineering leaders ask. That's the question developers ask, right? Okay, 50,000 tests you've written. So the first answer usually is to say, oh, let's look at line coverage. So you can look at line coverage, but developers listening to this will know, and I've done this in my life as well as a developer, you can game line coverage. Line coverage is the easiest thing to game, and if we know it, then AI knows it too. And I've seen examples of AI just gaming line coverage, writing an excellent test that does nothing, but does give you very good coverage. We need to get one step beyond. Now when humans are writing unit tests, and their tests are being reviewed, And the review is really good, then you do get these problems, these problems do get caught. With AI, uh, we're going to have to get smarter. And there's this, new technique that I think is gaining ground, which is good. it's not a silver bullet, but it's better than line coverage alone, which is called mutation testing. And the way that works, it says, okay, you've written a test suite. Now, whether you've written it or AI written it, it doesn't matter. It doesn't matter. the purpose of a unit test is to catch, unwanted changes, regressions in the test suite, right, and any business logic faults. So what I'm going to do is I'm going to take your test suite that you've written, keep it as is, I'm going to then jump into your code and start making changes. Complete chaos. So if you have a check in there which says if age is less than 50, I'm going to flip it. I'd say if age is greater than 50 and run that, run your test against it. If your test still passes, and I hope that's not somewhere in some pension calculator app, because that is not looking at the age at all. that's the sort of thing mutation testing tries to do. Logic flipping is only one of the many techniques it uses. But the idea is that I'm going to change your code six ways to Sunday, and I'm going to see how many of your test cases. Catch those faults. If your test cases catch 100, make 100 changes and your tests catch all of them, then your mutation coverage is 100%. Your tests are actually really good because even the smallest of changes gets picked up. if you're only catching 20 of the faults I'm introducing in the code and some of the faults are logic changes like I told you about, So instead of checking for equality, I start checking for not equality, and then you're still passing, then that test is not preventing anything out there, right? So the mutation testing is brilliant in catching the, in scoring your unit test suite at scale. It gives you two numbers, it gives you mutation coverage, which tells you, which is a measure of, uh, It gives you test strength, which is a measure of how good a specific unit test could be. And then it gives you mutation coverage, which tells you how much of this goodness is spread across your application. It's

Andrew Zigler: 34:06

get all these different actual metrics for figuring out if your code is, you know, good to go. And if changes to it have modified it down the downstream. And this is actually really impactful. I can imagine. For modernizing, like you said, having that those tests in place beforehand. Imagine like a really, really. Old legacy system within it, like the government or an enterprise somewhere, and they have to turn into something modern. And think of how impactful that project could be. But think of how disastrous it could go if they don't have a way of knowing if the new machine does the same as the old one. So that's like a really big unlock. for modernizing stuff. When you talked earlier about how it worked in this thing that stuck with me about it being deterministic, and about it doing the same thing over and over and over again with great success, you know, that really works against the narrative for me of how I envision AI and LLMs working today, because we all know that they're random, and they're stochastic, you know, they want to be different every time they don't like to repeat themselves. how do you approach using AI to solve something like testing if you want a standardized approach? Um, and what's the unlock there?

Animesh Mishra: 35:12

great question, because one of the things I've So we've been in business for about six years now. We're a company based in Oxford, England. We spun out of the university. and When we were starting out, there were no LLMs, so the, they hadn't sucked all the mind share at the industry. So AI and machine learning used to mean more than just LLMs. Now, thanks to the success of ChatGPT and OpenAI every AI has now been completely consumed by LLM. So if you speak. You must mean LLM. And, uh, this is something I actually struggle with in my job because, uh, when we are selling into the large companies, banks, et cetera, they have these model risk reviews and they will ask you questions like, okay, what kind of LLM model are you using? And then I'll answer, we don't. Say, oh, because you don't have a local LLM, do you have it in the backend? It's like, we don't have an LLM. All right, so where is the LLM? We don't have an LLM. It's the challenge that, uh, uh, that because ChatGPT is the most successful AI product, right? so that's how I would approach this problem, that there are many ways to do AI. So people looking at doing testing often start with Copilot because they already have Copilot. Right. They're already using Copilot for testing and Microsoft has a stellar sales force, credit to them. They already pushed it everywhere. So everyone is, we, like we say in Diffblue internally, we need to basically treat Copilot like weather. It's going to be there. We are going to have to prove that there are other ways to do testing and there are better ways to do testing. And so recently, uh, towards that, we did this benchmarking study against Copilot. And, uh, just to give a quick, brief overview, uses reinforcement learning model to understand how your code works. and then write unit tests for it. We don't look at just the plain text source code. We also look at the built by code of the application, which gives us the computational understanding of what's going on. So, when we're not able to write a test, We will also leave behind testability insights like, Hey, do you know what? This thing is missing a getter. And because this property is missing a getter, I can't write a good test for it. Go add a package with a private getter, come back, run it again, and you're going to get a better test. And all of this works completely deterministically. And so compared to Copilot. So what we wanted to understand was, okay, how do we compare against all of these LLM based tools? So, we did a study that found that, unit test generation, agent. So it's basically what we're comparing is an agentic system like DiffBlue, which is completely hands off. There's no LLMs, there are no prompts to be done. There's no, you know, back and forth. You just click a button, walk away, make yourself a cup of tea, come back and you've got your test suite. So we're comparing this agentic system Versus something that is more collaborative, more prompt based, thing like, uh, Copilot, but other LLM tools are similar. And what we found was that, when using DiffBlue, developer was 26 times more productive. Then using Copilot, which is clearly easy to understand because you don't have to engage with it. You run a command, you turn around, you do other things. Then you come back, you've got your job done. Whereas with the Copilot, you know, you're fiddling with the prompt. There's this whole practice of prompt engineering coming up, which I think will be quite short lived because companies will get better at doing prompts. So then you don't need to become an expert. And it's basically like the transition from command line to GUI. You can learn all the commands, but then somebody builds a GUI and you just click your buttons, and it's much better. So, uh, so that's the one thing that I found, which was actually, that was expected. That didn't surprise me. What did surprise me was, that, we achieved, uh, significantly higher test coverage than, Copilot every single time. And so when you put those two together, the fact that it's more productive, 26 times more productive, and it's giving you better coverage over a year, this translates into covering exponentially more code without breaks than something like Copilot. So the challenge in testing will be in for companies to figure out is this, does this testing require a human in the middle? Or can this be completely automated, truly autonomous operation? We believe, because we've done studies, that unit testing is a problem that can be completely automated because it's a very well defined, specific problem that you can train an AI engine to do predictably. In a deterministic fashion, so as long as you have the same code, and you use the same version of DiffBlue, you'll get the same tests. but more importantly, at a higher quality. Because, again, you take the probabilisticness away from it, and then that allows you to train the model better and do better tests with every release as well.

Andrew Zigler: 40:08

When I hear you talk about, the way that teams can be using AI and thinking about it beyond LLMs, if you are a team right now and you are building, AI resources, AI enablement within your org and you're trying out AI, what's like a good practice or a good habit that you would tell them to be successful?

Animesh Mishra: 40:29

I would ask them to ask themselves if they need AI. I don't think everybody needs AI. think, for example, if you're a small team startup writing some microservices, do you need AI? Because the problem is going to be, and this is actually going to become a bigger problem, I've noticed this myself, AI written code is harder to debug and it's not because it's AI written code. If you give me a job to do, I write code and then I ask you to debug it, it will be harder for you to debug because you've not written it. So this is, as a developer, we know this. We don't like debugging other people's code. AI is just other people's code, right? It's somebody else writing the code. So a lot of developers, this is your co pilot, this is your co pilot who's doing job for you, but it's not you. So it is adding friction into the process that when things break and when there are bugs, I am seeing and I'm hearing from companies as well, which is why they come to us, right? This first place, because most people come to us after having tried co pilot and not liking it for right. That it's taking more time. So the whole promise of AI making you more productive goes out the window. Like it's definitely not making you productive. It's maybe making your job more interesting. It's, you know, you're not just writing code in an IDE, having experiments, but that's the first thing I would ask people looking at AI, in software and engineering, identify a good problem and then ask yourself, do you really need AI there. And do you really need large scale language models there? Because there are techniques out there where you don't even need AI, right? So, simplest example ever, if you want just some way to make it easy for your developers to create microservices. There are two ways you can go about it. There's the cowboy way, which is to roll out some kind of an LLM CLI tool, which will create this for you. And the second way is to create a GitHub template. The first one is a one off effort, but it's predictable. Every single template Repository created from that will always come out the same, so then it's easy to debug, and you solve one, you solve, there's a problem, you solve it once, you solve it everywhere, that sort of thing. I'm seeing people using AI at these sort of things, I think it's actually going to crash and burn, it's going to cause a lot of pain, because, uh, what we're going from is, uh, and, We moved away from doing crazy things to more of a standard DevOps model in software engineering. And now we're then going again to doing some crazy things and then it's going to iterate and get us to a better place. I think we're in this middle where there's a lot of churn, people figuring out what to do without a good problem. You're not going to, you are not going to find AI useful. So I would, my only recommendation and my only first thing I ask people is like, what problem are you trying to solve?

Andrew Zigler: 43:24

Do you need AI? And if you do, make sure you understand how AI works and what types of AI need to be used in what circumstances. Today we learned about a more traditional version of AI that's beyond LLMs. Deterministic and could be really useful for engineering leaders trying to standardize and scale up solutions within their organizations, especially if they're already investing and putting mindshare into an AI driven world it's definitely a card to play and something to keep in mind. It's also something that allows you to unlock what test driven development was always meant to give us and maybe bring us better to having, you know, secure software that runs our world. So this has been a really insightful one for me. Animesh, it's been great having you on the show. It's really been fascinating to get to learn about your expertise. Before we wrap up though, where can our audience go to learn more about you and to follow your work?

Animesh Mishra: 44:18

To learn more about my company, DiffBlue, you can go to www.diffblue.com. We are based, like I said, in the UK, but we have customers all over the world. Our sales pitch is, uh, pretty straightforward. We believe that balancing quality and speed is crucial for sustaining reliable and maintainable software products. And the way to do that is to have reliable, maintainable, and predictable, Development tools. And DiffBlue is one such tool that we believe you should have in your arsenal to take advantage of the advancements that AI has produced for us. You can also follow us on X, our handle is at diffbluehq. think I've got that right. And, uh, you can connect with us on LinkedIn as well. If you'd like to follow me, I would love to connect with you personally. Uh, you'll find me on LinkedIn. My name is Animesh Mishra. you can search by username. I'm SirAnimesh on LinkedIn.

Andrew Zigler: 45:13

Oh, we'll definitely get your links in the in the show notes. Be sure to subscribe if you haven't already and share if you found this insightful with your teammates. And also be sure to check out our Substack. Our Substack is a weekly newsletter where we release our podcast as well as a roundup of some of the stuff we've discussed today. And I'll also be including the notes from today's guests in the Substack newsletter and on the show notes. Like Animesh said as well, we'd love to hear from you on socials. So please come find us on LinkedIn. I'll make sure we're both linked. We'd love to hear your thoughts on test driven development. Is your organization doing it? And what do you think of this kind of solution? And that's it for this week's Dev Interrupted. See you next time.

Your next listen

Dev Interrupted

The AI Productivity Platform

Features

Show Notes

Transcript

Your next listen

Speed is the moat

How spec-driven development is changing the rules

The CTO must now think like the CFO to survive