John Willis is a legendary DevOps speaker, mentor and author of The DevOps Handbook. He even helped coin the term DevSecOps.
But DevSecOps is one of those buzzwords that can mean a lot of things or nothing at all.
In this special 2-part series, John takes us on a DevSecOps journey from the origin of the term all the way to how software developers at the team level can implement the practice.
Episode Highlights include:
- What DevSecOps means at the team-level
- First steps for implementing DevSecOps strategies at your organization
- Why it’s critical to start thinking about security more holistically
- The difference between Security, Compliance, Governance & Risk
- What is coming next for software development security
Join the Dev Interrupted Community
With over 2500 members, the Dev Interrupted Discord Community is the best place for Engineering Leaders to engage in daily conversation. No sales people allowed. Join the community >>
Episode Recap and Transcription
Dan Lines: And let’s just let’s just dive right into it. Can you tell us what is DevSecOps to you?
John Willis: Yeah, you know, it’s funny, you know, so. You know, there’s always been, obviously, discussion around security DevOps, right, so we were going to assume that. That’s right. And but I honestly believe, you know, and I get pushback from friends actually say, well, John, don’t you remember 2013 DevSecOps Austin? I talked about security and operations and dev. But Shannon Leitz, who’s over at Intuit, who’s really my one of my primary mentors on all these things related to DevSecOps and security, which is just like insanely awesome. And you should get her on the podcast. But but she coined the term like it’s no big deal, but it is a big deal. And she literally created a website. So this thing called DevSecOps everybody got all upset and mad. And, you know, because even myself I didn’t get upset, but it wasn’t till I got to meet her where I’m like go argue with her, not me, you know. Yeah. You wouldn’t argue with her like like then you win, but but, you know, she just sort of put a stake in the ground on it. And the obvious thing that, like a lot of us fought it. Right, because we said, you know, I’ve been part of this when, you know and sort of Adrian Cockcroft, when he was at Netflix, started this no ops nonsense. And Adrian’s a beautiful, great man. But it was nonsense to do this. No ops right now. We’ve always had this or why can’t we call it that. And I’m like, no, stop it. It’s it’s a metaphor. It’s Dev and Ops. It’s a it’s a metaphor for collaboration and change. And then also instead of sitcoms, come on. I’m like, here we go again. You know, somebody’s trying to rename the own name. But when you got to meet Shannon and you read this sort of literature and then you just went back and looked at the problem is we went almost like maybe eight, nine years, you know, of sort of ignoring security. So all those of us like DevOps people patting ourselves on the back and like, we’re so awesome. I wrote this book. No, I wrote that book, you know, like I used to do a presentation where I’d say I take the mike and I take my hat, I throw it on the floor and I could have done it. We forgot security, you know, and and, you know, and a lot of people argue that we didn’t. But in the aggregate, if you go back and look at prior to like two thousand seventeen, almost every presentation, there was very little discussion. And one more piece, I was talking to a high level executive with a really large entertainment company. Everybody knows who was very savvy on DevOps speaks around the world. And it was probably around two thousand seventeen, as I mentioned, the word DevSecOps ops and he leaked out of his chair. No, I want to hear more about this, you know, and I thought, you know what, if you want to argue about the word, meet me after my session on the left hand side of the road because I won’t be there. I’ll be on the right hand side of the room talking about how do we move forward, you know? And so I think the word I think it will lose its efficacy over time. We can talk about that. But I think it was a point in time that the word was necessary to be used to shock the system.
Dan Lines: So, John, a lot of the listeners on our podcast are either, you know, engineering team leaders or directors of engineering people that are kind of close to the development, the day to day development. How would you describe how DevSecOps relates back to kind of the team level?
John Willis: Yeah, yeah. No, I like this question. You know, early days to sort of DevSecOps. People would say, John, come and look at my pipeline and they’d show me like they were doing a vulnerability scanning and that’s it. Isn’t that their second? Absolutely. Yeah. But so I like to think of like there’s no clear definition. DevOps is no clear definition of DevSecOps. But to me to pass my smell test, it would be a holistic approach. You have DevSecOps in the IDE. You have something you’re scanning constantly scanning your source control. So get a plug in for Security in the IDE. You know, you’ve got all sorts of vulnerability scanning. You’re using things like configuration, configuration, scanning for container, you know, on and on and on all the way through. Right. And you’re doing that in a holistic view. And you talk later when we talk about shift left. Right. But that allows you not only shift left on your sort of delivery of things that you’re developing, but you’re in the aggregate. It’s now basically how you do software development and how that affects your sort of ultimately audit and governance performance.
Dan Lines: Well, let’s talk about shifting left, right. So what I’ve seen in the industry is most of the time, any time we do shift the left, which a lot of the time means empowering developers, putting more decision-making control, you know, boosting developers up to make decisions while they’re coding. With DevSecOps, how are developers actually affected in the model that you describe?
John Willis: I love this show. So you know the structure, short history, right? Most of what we’ve gotten for sort of shift laugh really comes from being lean, comes from Toyota Production Systems. We think about the end on Encored. Why do we do that? We are amplifying feedback loops. Gene Kim talks about in we talked about it and I so I was co author of the DevOps Handbook, we talked about the three ways to devil’s right the first way and left, right, second way, right to left. That shift left. That’s finding amplifying the feedback loop quicker. And I love I was listening to I was a couple of months ago Charity Majors you know, and I think she’s a brilliant woman. I like again I just I’ve been I just enjoy watching her career. Just I saw the first time she ever spoke of the devastation. And I just it’s just fun to watch her. So to do what she’s doing in the industry. But you made this, like, really good way of saying something. I was like, you know, we all, like, have our way of telling stories. And then we hear somebody tell a story in a way like, oh, that’s just a perfect way to tell a story. So your attribution. But she said that what’s the difference between just a company X and Company Y? Company X basically has a couple of thousand developers and they deploy 100 times a day, 10, 20 hundred times a day. So I have my continuous sort of on demand there, sort of, you know, there like what we sort of all aspire to be the sort of well-informed. You’ve got the memo. This is what we want to be. And then it’s Company Y who has a couple of thousand of ours and they’re still quarterly. And the question she asked, like, what’s the difference between those two companies and the answer is that Company X learns almost two orders of magnitude faster. So it’s about learning, feedback loops are about learning. Right. Developing code as a hypothesis. But the only way we really know the actual test of that hypothesis is when it’s in production. So now fast forward to sort of what we do in security. We do audits once a year, all of the hypotheses. So we are like terribly waterfall. Insecurity, because basically once a year we have audits, audits come in, waste everybody’s time, and it’s not because of waste, because what we’re trying to do, it’s a waste because the system is so broken and it’s so subjective. So everybody is sort of in turmoil for about a month. People exchanging emails and ServiceNow you know, can you explain why that why the Nexus output data is different from SonaCube and like, wow, you don’t understand. Why do you even let them have that log data? They would have never asked this question in the first place, you know, and and so, but so but all of those sort of control regs and things, in the end the hypothesis and when we’re in a waterfall, we are like as bad as we were ten years ago in software development, we have to wait a year to find out if all the things we did over the year have high efficacy. For those control rigs and their sort of one window of a bunch of sort of again, it’s like, you know, imagine that back in the day when we did waterfall development. Why was it terrible? Because first off, we wrote the code like, you know, six or seven months ago when we went into it, we were just constantly firefighting and we weren’t learning anything about like like we had those sort of stretch to remember what the code was actually do. And then we weren’t really sort of learning anything about the bad mistakes or the habits that we had because we were in fire. Well, that’s where we are with security. So this is why I’m one of the most interesting things that I’ve been working on and I can give you for the show notes, you know, research projects, you know, working group projects with large corporations in something we’ve been calling DevOps automated governance where we we’re trying to change those 30 day audits into really sort of the same thing with development is into sort of zero day. It’s sort of it’s sort of audit on demand. It’s compliance on demand. You literally at any given time in the pipeline, those attestations are happening without human intervention. They’re being stored in acoustician store and anybody can just check the feedback loop or be notified of the feedback loop of.
Dan Lines: This is interesting. This is interesting. Let’s dive in here. I’m a developer. I’m developing code. And let’s say I do have, you know, CI/CD set up at my organization that that I’m working at. But I’m not necessarily, you know, yet DevSecOps. Ok, what would be some of the first characteristics as meet me as a developer that I received back from this dev sickouts? Is that that my code is getting highlighted with security risk or like I’m a developer? Like what? What do I get?
John Willis: Well, yes. So there’s a lot of ways to answer this question. And I’ll try to remember the multiple ways I would say.One is, you know, that, you know, one of the other big mistakes we’ve done over the years is give developers these arcane sort of security, only security interpretable information and say, hey, you’re build broke. Right. And like, what is OK, that’s great. It broke. What the hell does that mean? And then it will point to some security, you know, sort of nyst or something where it’s all written in security’s language. Right. And so, like so one of the things I like about the automated governance is that it starts. So one of the problems is the signal to noise ratio, right? That’s always the case with sort of security, right? If I’m getting so much noise, I tend to think everything’s noise and I start ignoring everything. And the needle in the haystack is the one that burned you. Right. But if I can actually get quicker feedback and move away from four to four and be sort of part eight, which is part of the solution is part of a feedback loop so that I can at least understand what they’re trying to tell me at the time I did this. Now, I can at least arbitrate a little bit better on well, you know, I mean, the classic one I love hearing from people is like, you know, John, I get like SQL injection notices and I don’t have any database calls, you know? And so, like. If I can get that sort of like how that relates to audit or why it is important from audit, because what you actually have happen is you move away from. You know, sort of reactive, you know, like, oh, we passed this, we didn’t do this, we got rid of all of those to sort of proactive. We’re not only people are able to sort of argue better against like, why do we have this, you know, like, why do I have to put in NFR in about, you know, about sort of business continuity for three? This is a true story. I have to put in NFR in for a story about a business continuity for data arrest industry. It’s like thirty three nines. I mean, I think the solar system would have to disappear before you lose the data, you know, you know, and so that now you can sort of like because it’s tighter, it’s there, it’s immediately on demand, just like everything else. I could have a quicker. And what happens then is I actually you start seeing this emerging patterns where all those terrible 15, 20 year old control definitions start going away. Because now they can be better explained because you have a tight feedback loop.
Dan Lines: Yeah, you don’t need that translation. I always remember that, you know, we did our PEN tests and it’s in your face.
John Willis: And then you combine that with rich people coming to design requirements. Right. And and they’re saying, I think the service needs these 10 things and you can sit there and like, I don’t need that. Well, why are we using cloud? We don’t really need an electronic attestation for the server. Like there are no servers, you know? I mean, and so you sort of throw that all together. I think, you know, you find that. And the one last thing I think is interesting, people who start adopting this automated governance model not only move away from reactive, right. Where like and they start filtering out. You start seeing these sort of like almost like, you know, sort of like, you know, almost like a vault of like institutionalized control regs that nobody can ever even have a discussion about. You start shaving them down because you immediate feedback loop, but then you actually start moving away from people talking like, you know, sort of pre automated governance, things like where they say, oh, I don’t tell orders things they don’t really know where to. Oh, that looks like a self identified control. Opportunity people actually get proactive about, like looking at ways that might affect the brand or and start seeing those as maybe I should submit this as a self identifying control risk. And you also see those those numbers go up like that’s beautiful that those on our client the other day I said, you know, there are sort of old school and like, I don’t know this stuff. I’m not sure. I’m like, if you don’t think a dashboard where the people who are doing development and you start seeing increased self-identified control risk from developers is like a beautiful thing, then you honestly shouldn’t be in this business. And I said that to a client.
Dan Lines: Is that what we like to talk a lot about metrics on this pod? Is that one of the things that you kind of look for in a DevOps? Let’s just call it transformation, is it? We’re going to see self-identified control risks increase over time.
John Willis: I think that’s a beautiful metric. And I have actually seen that used. But here’s the thing. Like, I think there’s a there’s sort of two things, right? One is at some point, DevOps and dev sickouts become sort of meaningless differentiation definitions. Right, because it’s all like what’s what’s sort of a, you know, sort of a terrible latency opportunity. Yeah, that’s a bug. You know, James Wickett, you know, he’s over at Verica and, you know, I love you. The quote he said years ago, a bug was a bug as a bug, as a bug. Right. There aren’t like security bugs in non-security bugs, their bugs, you know. So I think in one answer, when we look at metrics, I think we look at, you know, so I say one more thing. Who is it? Mary Poppindick? I mean, they would say, how long does it take you to get one line of code through your system? Right, well, I would say that could be a patch or could it be a security update? And the point then, I guess, which it says, then I could say to you. One side of the equation of what I probably wouldn’t call metrics is the ability to show continuous compliance or continuous audit. But can I hit an irregularly and I can see the state of everything that went through the pipeline also at the stations. There’s another component to talk about, which is sort of a policy is code that fits directly. You know, everybody talks about politics. Is the model that we’re working on is like YAML file that drives the automated stations. So like at any time you hit enter, I think block chain, but we don’t use block chain. But imagine now your evidence. So in that sense, a monitoring is I get to see digitally signed events that are sort of block change that map to what actually happened. Yes, it had a clean bill. Yes, there was a pairing on a merge. Yes, there was this. Right. And that’s all sort of one immutable signature. And it’s attached to a policy so that DevSecOps who said to me, that’s the perfect, you know, if we want to call that monitoring. But it’s really continuous compliance. What on the left hand side. On that left hand. So I was too right left and I know I do that, but then it’s the other side. Then we just go back to sort of the metrics that we know the door for. Right. Which is, you know, deploys change failed to change the success rate lead time. And I know you are, but like it’s now and then. But that’s just that’s not enough, because I’ll tell you this, I think if you don’t have the door before you or accelerate for whatever you want to call them, they become very popular and famous for all the right, you know, should be everywhere. You should have those, right. Like, you should have a pipeline. At least I’ll tell you is exactly what we’re providing a LinearB. But here’s the thing. Those are the late. Right. So I think we have to talk about this, I think once, because, you know, but this is why I like the idea of flow metrics. Flow metrics like dissect, so instead of just tell me lead time, it breaks down basically, you know, wait time and sort of work time or it’s not really tactile, but it’s you know, it shows me way too loud, shows me the intervals, and now I get a much finer that’s that’s a more leading indicator. Of what’s going on, so sort of flow metrics and and I think you do some really good publications on, like taking your change success by, you know, my standard or template form change success by team. But, you know, so I think you can you can really start a you know, so to me, those all become evidence of possible sort of. Opportunities of things that we would sort of normally consider as sickouts problems.
Dan Lines: Yeah, I mean, so one of the things I think, you know, as a listener listening to to this pod, I wouldn’t say that most companies are fully DevSecOped today. But I think what a lot of teams listening team leaders, VPs of engineering, it’s like how do I take a step in that direction and how do I measure that I’m doing a good job or making progress? Yeah.
John Willis: So, you know, I thought about this, too, right. Which is, you know, I think you know what it’s like, you know, again, sort of no disrespect to any vendors. But I, I think organizations like you know, I remember when I was trying to sell Chef to Ernest Mueller, who was at National Instruments way back when. Right. And they had built their own internal past before anybody had an internal pass. Right. And and and I was trying to sell them, chef. And I remember Ernest telling me, John, I think chef is amazing. You know, we’re probably going to buy it in the future, but we got to figure out how to do this ourselves first. You know, and he’s like, we want the reason we’re not egotistical, we don’t really believe that we should be building our own software, but in order for us to learn this is a new area for us, like nobody’s running passes. You can’t go out and Google internally bill passes. I mean, this is two thousand eight or twenty nine, right. Nobody had in fact, they were using adjure with the understanding that you are actually was a pass before. You know, when it first came out, it looked like, you know, the ability to use it was sort of the real meat behind it was hidden and they’d figured that out. So they built a pass, you know, their own pass around it and. And I love that explanation, he said, you know, what we’re going to do is we don’t know this, so we think the best way for us is to build something and then we’ll know really what we ask all the vendors. And I think it’s for a company that’s sort of like has some success. I think you can look at like what’s making the list of stuff like, you know, open ESCAP for container stuff. SonaCube These are all open source tools Nexus is open source. You’ve got Ansible, obviously. So I think one approaches instead of just going out to some large vendor, if it’s large vendors and have sickouts are going to basically what they’ve done is like everybody else in every industry is, oh, you know, like, you know, ten years ago there were no devil’s vendors, like, you know, a year later it is 30 vendors and now there’s one hundred. And it’s really they say we do this, therefore we’re DevSecOps. And so what you might get is what Ernest Mueller was afraid of would happen is sort of a partial or not completely understand the product because you pick the vendor to solve it. So I think there there are sort of a number of really good open source tools, you know, that may not, in the long run be the best solution for you. They may. But I think you have this incredible opportunity to look at and there’s so much written about how to use these tools, you know, get open SCAP or SonaCube or Nexus, you know, it’s just, you know, I’m just SonaCube by itself is like this insane. The data that you get from that, you know, from code within the code quality unit, test percentage, cycle of complexity, lintang. I mean, just that thing alone is like and then from there then go have a conversation with the big guys, you know, the big gals, the, the big companies and then figure out like, oh maybe I should be using, you know, you know your products you know. But but yeah.
Dan Lines: So you, your recommendations really start with open source that’s available. Bring that into your pipeline and that’s a good place to start.
John Willis: It’s incredible. Not written about this too, right. I mean, I mean, you can’t you can’t sort of Google and not stumble over it. Maybe a SonaCube open SCAP examples. Right. And so that’s a good way to sort of start thinking about a holistic approach, figure out what this looks like, learn a little about automated governance and, you know, not only sort of gate things, which is good, you must get things right. But you want to do that. But then but also, like, starts thinking about storing that as evidence and have deeper conversations with the risk teams to say, hey, you know what, if we gave you the data like this in a very objective non-human, you know, creative format, would that suffice? You know, the answer is you sort of know it’s better than what they’re getting, which is somebody filling out a sort of sentence or a checkbox in ServiceNow ticket. Right. Like, you know, going into it. But you just say, hey, wouldn’t it be better if you guys can see data like this, you know, and almost always they’ll be like, yes, I would love to see data like that.
Dan Lines: And then, yeah, yeah, absolutely. And now if we say, OK, I brought in some of this these opensource tools and I’ve started using them at the developer level and maybe at the system level, and it’s you know, these tools are reporting back pieces of information to me. What is and I don’t know if there is or not, but what is what are some of the metrics that would then tell me, OK, you know what, I am doing a good job. Is it that I’m outputting data to my security team is that that we’re not having security bugs found in production? Like what is the indicator that I might be doing this right.
John Willis: Well, again, I think it goes back to if you think about like at some point, a bug is a bug as a bug, you know, and then then sort of the thing that you will see in the aggregate. And again, you can you can separate like what kind of change it is. Right. Is it about is a fix? Is a security fix. Is it you know, is it like you can have those sort of categories? But but but then again, the evidence is, you know what, you need time. What your wasted time and what you waste time. What’s your sort of again, we can have a long discussion about how I think you should think about MTTR. Thinking about it as averages a terrible way. But but how do you think we should be better at restoring a service? Yeah, those all become because when it’s all flattened, whether it’s a vulnerability or it’s a software body, it’s all the same thing. So you’re going to see the value of those kind of scorecards or metrics be increased. You know, if everything’s in the system then and you’re gaining for security things, you’re going to see those latencies in lead time for week time, because now you’ve get it so that those sort of numbers that the number deploys, the number of those are all going to be there, all have the data that now includes the security stuff. Gotcha. Right. And one other thing, though, I do think there’s two of the points that I think we don’t do a good job in security. One is, you know, when we do send information about the security of something that’s been created in the pipeline, I think you I was told, just hire a summer intern. So instead of saying, like, hey, this is a stretch to Jakarta and blah, blah, blah, basically maybe get a summer intern to come in and say, oh, by the way, this is the this is the vulnerability that took Equifax down. Right. Put it in more human context. You know, don’t should it take the lazy route of like, here’s the CDE, you know, and like heres the nyse this definition of that vulnerability. I mean, for developers, it’s like, yeah, I didn’t you know, I didn’t spend two years learning security. I don’t like that. So it doesn’t make sense. So do a better job of creating your own. There’s a couple of companies where I’ve seen where they’ve actually taken and built their own internal GitHub project to show like not only the vulnerability, like, you know, I just use a simple example, but like an injection, like they’ll point you to the get up repository. What has the sort of the best practice way to code it in their organization? One last thing I think is important. I love the windows. There’s many things I like about what Shannon Lietz talks about. But, you know, when I first met her, you know, she was talking about like, we don’t break the build or we don’t get stuff, you know? And I was like, oh, wow. Well, you know. And she said, you know, what we do, though, is we have, you know, a very large red team. And we know like that you put a vulnerability in there is in the process. And so when it gets in production, we emulate the breach. Right. So, like, you know, like and, you know, like so again, I’m taking a little literary license here, but but sort of imagine that you get this sort of email that says, hey, by the way, that code you just put in production like five seconds ago, you know, here’s an emulator dump of 140 million credit cards. You got another five minutes to sort of remediated.
Dan Lines: Well, that’s actually really interesting. So they’re in that situation. You’re actually letting the issue get to production?
John Willis: Like you literally actualize, here’s what just happened, my friend, and by the way, the email went up the chain, not because it’s a retribution or blameful organization, it’s just we want to make sure that everybody knows how serious, what just happened or what went in production, how serious that is and why we have to react so quick to.