AWS Executive Insights / Security / ...
Raising the Bar for Security at AWS and Beyond
Hear from Eric Brandwine, Vice President and Distinguished Engineer at AWS about building a culture of security within an organization and how his team integrates across the entire company to establish and implement operating standards to ensure that security is paramount to the customer experience.
AWS Enterprise Strategist Clarke Rodgers spoke to Eric about how security is job zero at Amazon and how solutions are built, maintained, measured and tested in a way that optimizes the customer experience.
Conversation in detail
I would imagine all of the developers that work at AWS security don't necessarily start off with a strong security engineering background. What kind of mechanisms do you have in place to get those developers up to the bar that you set for AWS security? Especially in the builder tools or even perhaps customer facing security products?
Well, there's a couple of ways to think about that question. On the one hand, security at Amazon is a builder's discipline. We can't do all of the things that we need to do. We cannot scale with the business if we're doing so solely by adding engineers to the team. And so, a tremendous amount of what we do is building tools. It's straight up software development, just like you would do on any other team anywhere in Amazon, except you're building security tooling. And so, many of our engineers are not security engineers, they don't have a security background. They don't need to have any particular security expertise to join the team. Some of them have a passion for security, some of them just like the job and the team that they're working on and they're very happy doing what they're doing, but they don't have a particular interest in security, and that's fine.
The whole realm of security is in figuring out what assumptions some builder made and then figuring out how to violate those assumptions to push the system into an unexpected state. What happens when I jam a whole hard drive's worth of data into this field? What happens when I turn the dial up to 11? What happens when I put a negative number in this field? And the people that think like that tend, in my experience, to inherently think like that, and it's really hard to teach that mentality.
And so, when we find someone with that mentality, all of the specific security knowledge can be taught, all of the day to day technologies, we can ramp people up on them very easily. None of the things that I'm doing on a day to day basis now existed when I was in college, and so the fundamentals that I learned in college still apply, but none of the specific technologies do. And so, we actually have a really robust program for ramping people up that have that particular security vent.
So, the answer to your question is twofold. There's a set of people where we don't need to ramp them up on security. It's a standard development job. It's a standard builder role. And they excel. And some of them do express an interest in security, and that's great, we encourage that. But it's not necessary. And then on the other side, we find the people that are inclined to figure out how do I break things? And asking, "How do I break things?" leads naturally to, "And how do I fix them so they can't be broken again?" And all of the job specific skills there, we can teach people.
So, to that point, or I guess the reverse of that point, many of our customers, as they are trying to build out a security expertise throughout their own development communities, one of the tracks that they take is they'll take a security expert and put them with a "regular development team" to help build up the security bar within that team and basically have a security champion mindset. So, the idea is we have a finite amount of security professionals so we try to spread them out through the development teams and eventually all boats rise. Is that something similar at AWS and Amazon, how we look at things, or since security is built into our ethos, that everybody realizes, "I have that certain responsibility for security for my application, and therefore, I'm going to follow the process"? Could you talk a little bit how that might work?
Sure. Security is job zero. We fundamentally believe that just like scalability, availability, low latency, low jitter, security is part of the customer experience. It is a part of everything that we build. That said, security expertise is uncommon. We don't have nearly as many security engineers as we wish we had. I mean, I would love it if every single developer that we had were also a security expert. I'm not holding my breath. And so, I think it's interesting that you used the term security champion because that is literally the name of the program that we have internally, where we find security minded people embedded within the service teams and we have training for them and we have support for them to raise their security skills so they can help influence across their service team.
And then when a big decision happens, they don't have to take the lonely position and say, "I as the sole security champion here feel that this isn't ready for launch, or that we need to do this extra work." There's this whole security organization that they can draw on and we can work together from a position of customer obsession to figure out what is the right thing for customers. If we ship an insecure service, that's the wrong result for customers, but if we don't ship the service... Like a service that doesn't exist delights zero customers, and so we have to strike the right balance. And by having security expertise embedded within the service team, deeply sympathetic to the service team, and then security expertise in the AWS security team still sympathetic to the service team, but acting as more of an auditor role, we make much better decisions and we make them much more quickly.
From the CEO on down, the tone has been set. Everyone knows that security is important.
And I would imagine that that alleviates... Again, many customer conversations that I have, the security department is looked at as the department of no, or it's the department I need to avoid in order to get my application launched. With this model that you just talked about, it seems that that bridge of trust is extended and everybody realizes, "Well, security is just part of the job and this is how we do things at Amazon," in our case. And then the end result is a more secure product that comes out the door.
So, a while ago we were trying to come up with a mission statement. Like how do you say, what does AWS security do? And the best that we came up with was ship securely. It's two words and I think it does a great job of capturing why we exist. The company doesn't exist to do security. It's called Amazon Web Services, it's not called Amazon Security. And so, we're here to ship these web services to deliver for customers. If we don't ship, we are not executing. We are not doing what the business is here to do. And so, we are here to enable the business. We are not the reason for the business. And you can't have this business if you don't have exemplary security, but we're just one facet of the business.
And the most important thing for us is the executive sponsorship that we get. Clearly, security is incredibly important and that rolls down from above. And so, because we're not saying, "We're the security team, you have to listen to us. Please, please pay attention..." I don't have that problem. From the CEO on down, the tone has been set. Everyone knows that security is important.
Thank you for that. So, measurement of a security program, specifically with developers and others that are writing code within AWS, what are some key metrics that you use to measure the efficacy of the security program throughout the development community in AWS?
We have two places that we apply metrics.
You don't want to measure time to close the ticket. The ticket takes as long to close as the ticket takes to close. But the things that we do measure are our responsiveness. So, we have SLAs on first engagement. For example, if you email AWS-security, we've publicly stated you will get a response from a human in 24 hours. And we measure that. We actually have graphs and charts that I look at at every week showing, "This is how long it took us to get back to the people that are emailing us." And so, responsiveness really matters. One, because if you're not responsive, you lose trust with people. And two, because if you are responsive, it tends to mean that other good things are happening.
Another place that we apply similar thinking is in tickets staleness.
Everything is a ticket. And so, we have a ton of automation built into the ticketing system to make sure that if a ticket goes stale, and we measure both the amount of time between correspondence from the service team in the amount of time between correspondence from the security team, and so we know when tickets are languishing, we know when we're blocked on the service team or the service team is blocked on us, and we can very quickly surface the tickets that need immediate attention. But that also gives us the data to go in introspect and retrospect and figure out which of our processes aren't working, where we need to change staffing, where we need to invest in better tooling. And so, we measure the processes around security and we found that actually drives the right security outcomes.
The other thing that we measure aggressively, it's not about getting it right. We spend a tremendous amount of time in application security to design a good service, but Amazon never launches anything and leaves it alone. We're constantly adding features, we're constantly responding to customer feedback, and our services change rapidly based on that feedback. And so, our goal is not to launch securely, it is to keep it secure through the life of the service, and that means that the things that you do during the initial application security review quickly age and lose their value. And so, part of the application security process is determining which invariance, which statements we always want to be true about the service, and then figuring out how we're going to verify those in variants in code.
And so, if a service should always deny a request that's formatted this way, there should be a canary that's calling that service live in production with that particularly formatted request in making sure that it gets denied. And then we measure our canaries. How much of the surface area of the service are they covering? How often are they running? How often are they failing? How often are they getting anomalous results? And we measure those processes, validating our security stance. It's not measuring the delivered security. That's hard to measure. But it's measuring regressions in the security bar that we've already established. That is incredibly important because there's always going to be another security issue. Our teams are innovative, they keep coming up with new and exciting services that we haven't had to secure before. It's another one of the things that keeps me coming to work every day.
That's fantastic. So, in a related question, but from a customer perspective, we have some customers that are very, very advanced, they're doing infrastructure as, code, through a CICD pipeline into production. On the other end, we have customers who are solely in the console doing point and click activities. Just from my experience, the majority of our customers are somewhere in the middle trying to get to that infrastructure as code "nirvana." What advice would you give to customer leadership to really encourage more of a focus on engineering versus the operational point and click aspects of running an infrastructure?
So, I have never built anything beautiful. I have built things that I'm incredibly proud of, things that have done very well in the marketplace, whether it's the public marketplace of AWS services or the internal marketplace where our customers are our service teams and other Amazonians. But they're all systems that started off with a kernel of an idea and we built what we thought was the smallest thing we could build that would delight customers, and then we iterate it as quickly as we could. And they grew overtime. It's that iteration that gets you the magnificent tools. The people that are close to them think of them as Frankenstein's monster. It's that piece of garbage, like it's all baling wire and duct tape, looks like MacGyver built it. But the reality, is they're magnificent tools. They're stunningly effective. And because they were built for the job they're doing, incrementally step by step, they actually do the job that they need to do.
And so, when someone comes in, whether they're joining the team or whether we're talking to a customer about how we do things internally, they see this array of tooling that we have, all of these mechanisms that we built, and it's overwhelming. It seems like there's no way I could replicate that. One, you don't need to replicate it. This is to handle our specific security issues. But two, we do didn't build these things, we grew them over time, and all of them started small. And so, it's that incremental approach. When we were just talking about metrics, I talked about no regressions, about not having to solve the same problem twice. So, get better every day. Every day, you incrementally raise the security bar and exponential growth kicks in.
So, for the customers that are viewing this, the idea is to, essentially, start small with your engineering efforts and then just grow overtime and make them better and better and better, versus I need to change my approach to everything all at once?
Absolutely. That incremental mindset always pays dividends. And it has to be married on the other side by security professionals that are not Chicken Little. We are all surrounded by risk every day. Crossing the street is a risk, driving your car as a risk, plugging your laptop into the network is a risk. And so, we have to become comfortable with taking appropriate risks. And so, security is the art, and I wish it were more of a science, but I think it is an art, of managing those risks, of understanding which risks area cceptable, which risks can be mitigated, and which risks are flat out unacceptable. And so, as a security professional, in any role, anywhere insecurity, you have to be able to talk about how serious this risk is.
In the security organization, when talking about security, a phrase we use all the time is clinical and precise. If you say, "This is the worst security vulnerability ever, and there's nothing we can do to fix it," you just lost a tremendous amount of credibility. You've closed off all areas of discussion. We're no longer negotiating on the path forward. You've just shut it down. If instead you say, "This is a really concerning issue. I am worried about this specific impact", here's three possible paths forward. I like the first one, it's more expensive, but it also provides this benefit. Let's talk about what we need to do here. It's clinical, it's precise, and it opens the conversation. I'm bringing you my expertise so we can have a conversation about the business. And so, the engineer that's building this security tooling needs to have that in mind. They need to be thinking, "How am I making the business better?" not, "Oh my gosh, everything is on fire and it's all terrible."
About the Leaders
AWS Vice President and Distinguished Engineer
By day, Eric helps teams figure out how to cloud. By night, Eric stalks the streets of Gotham, keeping it safe for customers. I am marginally competent at: AWS, Networking, Distributed Systems, Security, Photography, and Sarcasm. I am also an amateur parent and husband.
AWS Enterprise Strategist
As an AWS Enterprise Security Strategist, Clarke is passionate about helping executives explore how the cloud can transform security and working with them to find the right enterprise solutions. Clarke joined AWS in 2016, but his experience with the advantages of AWS security started well before he became part of the team. In his role as CISO for a multinational life reinsurance provider, he oversaw a strategic division’s all-in migration to AWS.
Take the next step
Listen and Learn
Listen to executive leaders and AWS Enterprise Strategists, all former C-Suite, discuss their digital transformation journeys.
AWS Executive Connection is a digital destination for business and technology leaders where we share information.
Watch on Demand
Get insights from peers and discover new ways to power your digital transformation journey through this exclusive international network.
Listen in as AWS and customer leaders discuss best practices, lessons, and transformative thinking.