Continuous Delivery, the action of having a short feedback loop when shipping software to production, is the best way for engineering teams to operate. In this incredible episode, I bring in Charity Majors, CTO of Honeycomb.io, to explain how to implement continuous delivery, and why you should walk out if your organization doesn’t support the practice.
Why we need Continuous Delivery
The benefits of continuous delivery are enormous, and almost every engineer you ask would agree. But almost no organizations are shipping software in under 15 minutes. So what’s stopping us?
In this episode of Dev Interrupted Charity and I discuss :
- How to break out of the “Software Delivery Death Cycle”
- The overwhelming benefits of having a short feedback cycle
- The greatest failure of our generation of software developers
- How to implement true Continuous Delivery at your enterprise
- What Observability Driven Development means at the developer level
Asynchronous Episode Recap
Dan Lines: Yeah. That’s actually the first topic that I kind of wanted to dig in with you here. You know, I think you’re a known advocate of CD. That’s the CD part of the CI/CD and also kind of how in general it relates to shortening feedback loops. You know, I hear you talk about, you know, maybe some smaller PRs that are more easily able to be reviewed and, you know, making sure we’re all animated to get good work shipped out into Prod. And the quicker you can get the code out, then you can shorten the feedback cycle. So, you know, kind of just wanted to start by asking you, why is a short feedback loop so important and how long should it take?
Charity Majors: Well, there are two reasons that it’s important. And the one that we always talk about is, you know, for our customers, it’s better for our customers if we’re able to find bugs quickly and find bugs instead of letting them find bugs, like months later, you know, we can get our features out faster and all this stuff. But the other one is just like it’s important for like the the life of the engineer, like for for the if you if you want to find joy in your work, you need to be able to experience its impact on the world. And it’s like shortening the feedback loop is just a way of making it feel more natural. It’s a way of like plugging into your body’s like natural reward systems, like dopamine and like adrenaline and all that stuff and and training yourself to have a good a good gut feel for what you’re about. You know, like I feel like if I were the senior engineer to me that says that I can trust their gut instincts. Right. Like, sometimes it takes me a while to unpack why you have an intuition of this thing you’re doing is about to be terrible. Right. But you have an intuition. And the only way to train that gut feeling to it to be able to be intuitively correct is to train like your internal data corpus, so to speak, on production. If all you ever interact with is your code and staging, like your intuitive gut feeling is not going to be worth much. So I feel like shortening those dieback loops is important because it’s, you know. And how short should it be? As short as possible. I think that fifteen minutes is a good upper bound. Right. Like if you can if you can rely on any time you merge your code back to main or master within fifteen minutes, your code will be in production. You can kind of build it in its muscle memory that, you know, while you’re writing the code, you instrument could, you know what you’re trying to do. You’ll never know it better than you know it right now. Right. You know what you’re trying to do. You write to miscommunication that will help you, you know, tell is it doing what I expected it to do? And then you flip over and you what is it doing, what I meant to do and if anything else look weird. Right. And that that it feels weird is kind of irreducible amount of complexity because it’s about those unknown unknowns. Right. The things that just strike you as off. And and if you can if you can if it’s like fifteen minutes or less, you can pretty much you can you can you can build up muscle memory and you can expect engineers to always look at their code after it’s been shipped and that that creates a feedback loop where, you know, I feel like eighty percent of bugs are usually found at that moment when the person who just wrote the code is looking at it through the lens of the instrumentation that they just wrote in production use are using it right now. And if you don’t look. At it, those bugs found that they become part of the background noise or someone else has to be bugs. Days, weeks, months later, and it’s just the longer it said, the harder it will be.
Dan Lines: Yeah, I mean, if we all had that 15 minute feedback loop like that would be unbelievable. Right. I’m a developer. I’m creating code. And I could actually see that released in the Enterprise. I mean, that’s where we’re trying to get it right.
Charity Majors: It connects you with the with the fruits of your labor. And that’s intrinsically so motivating. It’s so exciting. It’s so fulfilling. And it’s like I feel like a lot of people who aren’t connected to their to their labor, who are just kind of checking and checking it out because, you know, they never really get to witness its impact on people’s lives.
Dan Lines: Right. And, you know, the thing is, one of the things that we’re doing at Linear B is kind of providing that visibility of the CI/CD pipeline. Right. So our customer base will come in the LinearB they see, OK, cycle time, lead time automatically. And unfortunately, you know, correct me if I’m wrong, but most companies are not at that 15 minute feedback loop yet. We’ve made a lot of progress on S.I media as a community, but not so much on that kind of what you’re seeing. Well, in the community.
Charity Majors: Yeah, absolutely. We talk about CI/CD all day long, but like when you drill down into it, mostly people are just talking about CI and CD is not even in the conversation. And I think this is so unfortunate because the entire purpose of CI is to prepare for CD, because CD is actually what changed your life. It’s like the pay that labor like for what if you’re not taking that last vital step? It’s super important. It’s not an afterthought. It is the purpose of all this work that we’re doing.
Dan Lines: Yes, yes, so that it’s so unfortunate, right? So if we make a lot of great progress with CI in some of that great progress, like some examples. Right. I think it’s like, you know, small pieces of code automatically tested. You are getting feedback as a developer kind of while you’re developing your test or running the test, OK, which is a great first step.
Charity Majors: But it’s like fake is by logic, correct? Sure. But that’s like only the beginning of the question, right? Right. Code isn’t real until it’s interacting with actual users, actual production system. That’s for the majority of the question mark set.
Dan Lines: All right, I read a blog that you wrote has been on Stack Overflow. That was a pretty cool blog that you wrote, and you came up with some type of terminology that talked about a death spiral for organizations, right? Yeah. And this is kind of a negative spiral of that. Yeah. Engineering boards can kind of get into and then kind of worst of those things happen. Can you talk can you explain that? That was it.
Charity Majors: It’s the interval between when you’ve written the code and when the code is live and production and software ages and that interval like fine milk. It ages very, very poorly. Right. The longer the longer the longer that interval gets, the more pathologies creep in your organization. You know, just get longer reviews, take longer. People are stuck waiting on each other a lot more. And then you get to a place where you, like, have to start hiring specialists, like now you need it as a team just to run your deployers because they’re patching up changes. So they’re all Breaky. So any time you deploy it, you know, it pulls in like five to ten people that that was all their work weeks because you’re like to get bisec to try to figure out which one of these fifty one dips is actually the fault of that. They just broke. And then you need like a building like the back of the envelope. My estimations, which are not based on any real science but are based on many people’s observations, is that if you said you need X number of engineers to write and maintain this this piece of code, well, that’s if you if you’re shipping in 15 minutes or less, if you’re deployed, time interval is on the order of hours. You need to x, you need to double the number of developers if you’re deployed. Time is on the order of days. You need to double it again. If that’s on the order of weeks, you need to double it again. And this is where you get like like Honeycomb for the past two years has had eight people writing code for everything from the storage engine, the API. The SDK is the you know, the the app that everything and our displays are always under 15 minutes. And if we are deployed, we’re on the order of days. We would have needed about 60 people.
Dan Lines: Right. I mean, I related to that so. Well, I don’t know if there’s terminology to coin that, but it’s kind of like, OK, if my software delivery pipeline all the way out into products like exactly that system. Yeah. And you’re having these symptoms, the negative symptoms and one solution to a negative system such as, OK, we’re having issues and tried, I better build out an SRT team to take care of those issues. Yeah. That’s stacking like more people, more complexity, more things in the way of short term gain. And I need to be honest, like I when I was a VP of engineering, I have the necessary team because we were having issues in Prod, which I would have said, OK, what is the root causes you get? You know, on this pod, we do try to talk about, you know, metrics or bringing like visibility or observability. Kind of what we found in our community base is it is easy to say these things like, OK, I want I want a fifteen minute feedback loop. I think if you said that to a bunch of engineers, nobody’s going to disagree. Yeah. Yeah. So like one thing that I’ve seen is if you unlock some of the visibility and just even like show where you’re at today, that can have a very positive influence. Yeah. And I saw that you kind of write about observability or like observable driven development. Is that the same thing or can you break that down for us?
Charity Majors: Yeah, this is that observability driven development is actually what it’s exactly what I was just talking about. Yeah. It’s instrumenting as you write your code and then watching your code and production. Right. And asking yourself, is it doing what I expected it to do and does anything else is weird, you know, closing that loop. It’s absolutely the same thing. And most teams do don’t have the ability to do that because they don’t have observability now just kind of this like there’s been kind of a bandwagon effect where we started talking about observability and defining it and stuff. And then everybody, everybody out there, it does a monitoring tool or, you know.
Dan Lines: Yeah, I saw that. Everything’s observability. Yeah.
Charity Majors: There’s a very specific technical meaning for Observability in my mind. And it means, you know, can you understand, you know, can you look at can you understand the inner workings of a system, any state that has gotten itself into just by asking questions from the outside without shipping new code to handle that state? Because that would imply that you knew in advance what to expect. You know, so and and the technical criteria that you need in order to achieve this, it starts with how you instrument, you know, when the. Request enters a service, you initialize a new event, populate it with everything you know, going in about the context, the parameters, etc., and then while the request is executing in that service, you step in anything else that might be useful in the future? Any any IDs, right. High coordinality data is super useful for observability shopping cart ID, user ID, you know, anything about the environment, language, anything. Right. Step it in and then when the request is ready to exit or you ship it off as one arbitrarily wide structured data blob to a service like Cunningham. And then then you exit. Right. And then on the service side, what matters is, you know, you can’t rely on indexes because indexes, again, imply that you predict in advance what what what’s going to be needed to query on Tuesday, like some kind of column or service database or whatever, any ability to slice and dice in their real time and arbitrarily and high dimensionality coordinality data. But then once you have that, you know, you can you can you can you can slice and dice and look exactly like you see a spike in requests, are they erroring? Well, what is different about these requests next to the baseline? Right. Like these specific requests are different in these five ways. Well, that’s going to tell you a lot, right, based on what is different about them. And if you have the ability to to practice observability, then you could do things like breaking down by building and watching your code is deployed and looking at all of the all of the metrics that are different from your build versus the previous one is you’re like running and production or something. I also really began to feature flags, right. You need to decouple, deploys and releases because just shipping your code to production shouldn’t mean everybody automatically sees it. Right. That’s a that’s a pretty dangerous thing like all of these shifts, frankly, there’s right now, there’s gravitational sort of shift going on away from preproduction and staging variants like we’ve we’ve invested so much time engineering cycles and dollars into staging over the last decade. And right now, I think that you’re seeing this massive shift to focusing on production systems. You see that with like it starts in the last five years like us and LaunchDarkly and Gremlin and all of these. It’s like long overdue in my estimation, because is code is worthless until it’s in production code is dead code until.
Dan Lines: And this is going to really get me to the real thing, the real show.
Charity Majors: High impact. Yeah.
Dan Lines: I think you wrote something in some of the blogs that I checked out. You said pay attention to the process, because again, if we tell developers, hey, you want 15 minute cycle cycle time. Yeah, yeah, of course. OK, how are we going to get there? And a lot of what you were saying with Observability with it sounded like to me is making sure that we have the right information. So when we were observing, we can see things like, you know. Yeah. Like you’re saying build ideas or whatever is important to affect our process.
Charity Majors: Yeah. Yeah. I think like I say, this is like the greatest failure of our generation of technical leadership is the fact that we just because it was straight up, like you said, find me a room of 150 developers, buy me a few of them that don’t believe in this right now, find me one hundred and fifty teams that have continued delivery. You fucking can’t. Right? Every one of those knows that it should be this way. And almost no engineering team to run this way. And that’s because. I think that it is a kind of a hard sell if you’re an engineering manager or a VP or director or whatever. Tell your it’s always kind of a hard sell to tell them that you’re going to be using engineering time not to work on features, but to work on the pipeline or your deploy process. It seems like navel-gazing, right? It doesn’t. And you have to really forcefully make the case to them in dollars and cents in headcount that this is valuable and we just need that. Right. It’s a hard sell to to to your higher ups to tell them that in the short term, we’re going to take this stable ish system. You have something that works ish and you’re going to, like, destabilize it in these ways. So the longer term, it’ll be better. We promise that’s a hard sell. But on the other hand, you have one job as a technical leader. You’ve got one job, which is to make sure that your team is spending their time on the highest impact work that they can, for their own sakes, for their own mental health, for their own joy in their craft and profession, as well as for your users. And almost no tech leaders have made this job.
Dan Lines: Well, that brings it back to, I think, a core question, right? We’ve established that everybody wants CD, but we’ve also established that most engineering organizations are not practicing CD, especially something in like the 15 minute round, which probably correlates to true, true CD. So are we running into a people issue or a technology issue? What is preventing us?
Charity Majors: This is a failure of will. This is a failure of will and a failure of courage on the part of our technical leadership.
Dan Lines: So how do we make a change as an organization? That’s what I’m asking myself.
Charity Majors: We as engineers, if we can’t get our our leaders to to invest in this, we walk. We go someplace that will. Like ultimately, like you have the power to to create the craft and that that is your super power as an engineer, you have the power in your hands to to build things or you go rogue. You just do it yourself. Like, you know, I’m a big proponent of ICs being leaders of not just treating like managers as leaders. But, you know, if you if you’re an engineer and you want to be a technical leader, fucking do it, take it on. You know, take the heat like you have the power in your two hands to make this better. So, you know, I think it’s a failure of leadership. But I also think that that creates a a real vacuum where anyone can step up and make it better. The results will pay off. Right. Like we have all this data. Read the book, accelerate. Right. Like there’s so much data. It is uncontroversial at this point that this will make your organization better. So do it. Yeah, go out on a limb.
Dan Lines: I want to share kind of one thing that I have seen that that works because I think, you know, to make the jump to say, hey, let’s just walk or let’s go find new jobs. I do think there’s kind of a step in between that, the extreme right and where we are today. But yet, at the end of the day, if we’re still getting pushback right, you have to try to make the situation better.
Charity Majors: You should. But there’s a point at which we’re just rewarding poor leadership and poor behavior if we if we stick around.
Dan Lines: One thing that I’ve seen kind of help with leadership or kind of influencing is to bring data into the conversation.
Charity Majors: Absolutely. So and, you know, to speak their language, make sure that you’re not just talking either. Give them a chance. Right. If you’re just talking about it in terms of, you know, how cool the technology is, that’s not going to do it. You have to convert it into the language of dollars and cents, head count and so forth.
That’s the thing I’ve seen engineering leaders that would come to an executive team, you know, a lot of the times, not all the times, but a lot of the times, you know, a CEO does not come from a technical background or something like that. But these are smart people. If you come to them and say, listen, our number one job is of an engineering organization, is to deliver value to customers. And I think CEOs would like to hear that. OK, good. But we have an issue with our cycle time, and I want to show it to you right now from when coding starts to deployment, it’s taking us days. And you know how we can ship more value to our customers. And we can, by the way, recruit better engineers who want to work in an environment that gets cut out faster. And we can fix bugs if we do, for example, have them in production. It’s getting that’s down to 15 minutes. And here’s where we are today. And here’s a plan to do that. I’ve seen that conversation actually go well when it doesn’t go well, if you don’t bring some type of data because it’s. Have you seen or whether you absolutely agree?
Charity Majors: Yes, absolutely. And, you know, if you’re an engineering leader who wants to make this kind of pitch and you’re nervous or you want someone to proofread and, like, give you a thumbs up, I’m happy to be that person for you. There are other people out. There are other engineering leaders of Intercom is a great organization that they’re like quite large now and they’re field ship times. I think ten minutes is a real shop and that’s, you know, how much code that is. So like, you know, there are engineering organizations out there and we’re always looking for buddies, you know? So you’re in good company and we’re happy to help. We’re happy to give tips on pitching and whatnot. But yet don’t not try, I guess is my my biggest thing. Like, you know, this could be a career maker for a good engineering leader. You know, you could make a personal goal of doing this, do it once you’ll get promoted because it makes a huge impact. The people who worked with you will follow you places for the rest of your career. You could get writing out of this. You can you can give talks about this. You go on to bigger and better positions. Doing this at company after company like this is a huge opportunity for people who want to strike out and make a name for themselves.
Dan Lines: Yeah, absolutely, I mean, not only can you can make a name for yourself because of the impact, because of the incredible impact that you have, the impact, it’s going to be huge, both from an engineering perspective and from the business perspective. Right. Impact is like a career change.
Charity Majors: And this is this is not controversial. This is not a risky proposal in my mind that it’s like the obvious proposal. So.
Dan Lines: Right. Sure. That I’ve seen the metrics that that work in this type of conversation usually are the lead time in the cycle time metrics for the metrics that your report recommends.
Charity Majors: Right. That everyone should really be tracking, which is lead time to deploy, how long it takes to deploy, how he deploys Phil and your time recovery right rate as everyone where you make sure that you start there.
Dan Lines: Is there anything else interesting that you’ve tracked or is it kind of just like laser focus on those?
Charity Majors: I mean, that’s a great place to start if there’s anything else. Well, one of the things that I recommend is just tracking how often your folks get alerted outside of business hours. I’m a big proponent of every engineer who works at highly available 24/7. Service should be on for their stuff. But I think that the other side of that handshake is on-call can’t suck. It can’t be some people have to plan their lives around or like, you know, it can’t be super sleep. And I think it’s reasonable to ask any engineer to be woken up once or twice a year for for their code, for their service. But that’s it. Like we can’t ask people to be on call if it’s if it’s going to kill them or burn them out. So tracking, tracking that tracking that is, I think, important just from a humane perspective. And there may be there might be another one or two, but it will probably be very, very custom to your org. Right. Something like the critical path of accepting payments or something like that.
Dan Lines: Yeah, and by the way, let’s not kid ourselves at the executive table, there are conversations around employee productivity. It’s very easy to make a case very easy that if our engineers are getting paid to wake up in the middle of the night to come fix the issues in production, are they going to be more or less productive next week of work? They’re going to be less productive and grumpy and innovative. Always time these important metrics back to business use cases from what I’ve seen.
Dan Lines: Absolutely. OK, awesome. Lots of great topics here. Kind of one last topic that I wanted to bring up with you relating to how can I make a change? You know, my organization is not at 15 minutes I caught this. And I think either something you said or something you wrote, a lot of us do work at bigger companies and the thought of taking, you know, hundreds of engineers and projects and all of a sudden and making them into a 15 minute cycle time. And the time is daunting. You don’t have to do that. Right. And just start with the new stuff like new project.
Charity Majors: It’s so easy if you start with like a 15 minute deploy time or just start with a continuous delivery, it is so easy to just stay there like it’s much harder to dig yourself out of a hole with a mature, stable project that but you can at least start fresh with new stuff. And the difference that you will experience just the joy of developing in that environment versus the old stuff tends to have a ripple effect of its own people. People get hooked on it and they can’t imagine going back.
Dan Lines: I thought that was a really big point to make sure that you said and we kind of brought up here because, again, if you are trying to make that impact or you’re trying to get a buy in from managers of managers and executives, you don’t have to come with, OK, you know, we’re at a 15 day cycle time and I’m just going to make everything better and fit to 15 minutes in this amount of time. That’s not even believable. Who are you could say is this is a journey for us? We’re not going to completely stop all of our customer commitments and, you know, next week’s feature delivery. But what I’m going to show you is that in these three services, what I’m asking for is a few months or whatever time you think you need to get to that 15 minute deploy, I’m going to show you with data and we’re going to do this progressively. That is a very reasonable way to get buy in. And if you show that improvement, you’ll find what I’ve seen. You’ll find that you’ll get by. And for the next time you ask, OK, you saw this in. Now we need to attack this next project. Yeah.
Charity Majors: Yeah. You’ll earn some credibility. And honestly, this with this stuff, any improvement is improvement. Like it will be felt like if you bring it from two weeks down to days. Oh my God. Life will get better four days down to hours. Oh my God. You’re like this is not like you have to reach the promised land in order to see the improvements, like improving this stuff. It shows right away. It’s good to make itself felt right away.
Dan Lines: Awesome. So, you know, charity, this has been a great conversation.