AWS Executive Insights / Security / ...

Attracting Customers with New Digital Experiences

Vulnerability Management in a Zero Day Security Scenario

A conversation with CJ Moses, AWS CISO and VP of Security Engineering

When a “zero day” security situation occurs, how will your organization respond? Hear how AWS responded to Log4J, a real-life vulnerability management scenario. Don’t miss the final part of our three-part Security Leaders conversation with CJ Moses, AWS CISO and VP of Security Engineering.

This interview is also available in an audio format. Listen to the podcast by clicking your favorite player icon below, and subscribe to AWS Conversations with Leaders podcast to never miss an episode.

In the conclusion of our conversation with CJ Moses, Clarke Rodgers, Director of AWS Enterprise Strategy asks CJ for a play-by-play account of how AWS handled the Log4j vulnerability of 2021. Watch the video above or see their conversation in detail below to discover what basic cyber-defense strategies helped the AWS security organization quickly react to and mitigate Log4J.

Be sure to watch part 1 and part 2 of this series to catch our full conversation with CJ Moses.

Digital experiences that build customer confidence

A CISO’s perspective on vulnerability management

CJ Moses (00:22):
A lot of the issues today still go back to some very basic hygiene, care and feeding of patching the stuff. Everybody freaks out about ransomware. That is the big buzzword, still is. Everybody's, "What can I do about ransomware?" My basic answer is, "Basic hygiene. Patch your stuff."

Clarke Rodgers (00:40):
That's right.

CJ Moses (00:41):
Because ransomware itself, malicious software, only finds its way into your system through various ... a few different ways. A lot of that is lack of patching. Then we can kind of focus over to the other side is phishing or social engineering, one of the two. But phishing, clicking, having your employees clicking on stuff, pulling it into your environment. Even then, some of the basic hygiene of having defense-in-depth, least privilege for the employee, so that even if they click on it, it's isolated to their box and a few things they may have access to. Not have it be able to move sideways or off of the systems. Using EDR [Endpoint Detection and Response] and other types of technologies in order to prevent and protect.

So, all of these things kind of line up. It sounds like a lot, but quite honestly, it's a basic cost of doing business. Unless you want to be, as we've seen the pipeline, the oil pipeline provider that ultimately couldn't provide gas to the east coast for a month. Those types of things are very real, but could be very easily mitigated if you pay attention to the basics of hygiene. Those are the types of things where, when they occur, you see, "Oh, well, the CISO got fired." No, no, no, no, no. This is part of the business. Single threaded leader. Who owns the business as a whole? It’s the CEO. The CEO needs to understand the risks that they have. It's the CISO's job to make sure they're aware of them and articulate for those risks to be mitigated.

CJ’s advice for communicating risks to the board

Digital experiences that build customer confidence

CJ Moses (02:13):
When talking with board of directors or executives, C-suite executives, metrics are great. In Amazon, they work better than in most companies. It's because that's the kind of culture that we really have. But even board of directors or otherwise, they remember stories, anecdotes, issues that they can relate with and remember. If you go in with a whole bunch of metrics, how many of those metrics do you think that a board of directors member is going to remember two days later? Probably none. Maybe one or two that kind of really stood out.

But if you tell them a story about what had happened, and the people involved and how it impacted them. And use cases of other types of businesses that have been in a similar situation and the cost to that business. There's plenty of examples. But you can see the actual monetary loss and the opportunity cost to the businesses. That will impact the company for years to come, if not forever. So, the idea of being able to get in front of those things.

Log4j was a big one for us. Explaining how we had 1.2 million things that needed ... or 1.2 million actions that needed to be taken by 17,000 employees in order to patch all of the things. And that we want to do that in a more streamlined, automated way going forward, so you don't have to have 17,000 people doing that. They might remember the numbers, but what they're going to remember more is the automation that's coming forward and that we're doing. Those are the types of things that you try to explain to the board, or to the CEO, to get them on board with your security culture and-

Clarke Rodgers (03:47):
And then frame it, I imagine, in profit and loss and business terms, and risk business, versus this library of this piece of software needs to get patched or this bad thing happens, right?

CJ Moses (04:00):
Obviously there’s a friction, there's a monetary, financial friction. Slowing down the business potentially, or using money from other things in order to do security related things. How much security tech debt does the company have?

What was Log4J and how did AWS respond?

Digital experiences that build customer confidence

Clarke Rodgers (04:13):
CJ, I'm glad you brought up Log4j because it's something that a lot of our customers faced December of last year, with the Log4j vulnerability, and of course, the Log4Shell exploit. Can you walk us through what that looked like inside of AWS?

Would love to hear a little bit about what AWS Security kind of did, and sort of ran and then what the engineering teams and the service teams had to do from a security responsibility perspective.

CJ Moses (04:46):
Yeah, Absolutely. So, I think it was December 9th when things started to break. It had been a while at that point in time since we'd had a large scale “zero day,” as we call them, kind of drop. Where you have millions, literally, of computers that need to be patched. Obviously, we have a process for doing so, and we do it every day — just not at the timeliness of this one that needed, essentially, to have all of those things patched immediately, would be the preference.

From a security perspective, obviously, a lot of our responsibility was to rally the troops. Pull together the issue response, if you will, to identify where. Number one thing you want to know is visibility. Where are these vulnerable systems that need to be patched? How many of them are there? What are the defense-in-depths that are in front of some of them so that you know the prioritization of what you need to patch first. Because obviously, things that are facing the internet directly, even with some mitigations on them, are the things that are more vulnerable from a risk perspective than those things that you have many more defenses on top of.

A lot of that triage was started immediately. That is done in direct cooperation with all of the service teams. Because, once again, they're responsible for the security of their services.

Clarke Rodgers (06:03):
They own it, right?

CJ Moses (06:05):
And they own the patching of their systems too. Last thing you want is, traditionally, for the security team to be patching their stuff. In this case, we did the “triage,” as we'll call it. Figured out what the opportunities were. Started down the path of figuring out what the patch would look like — because “zero day,” meaning normally there's no patches. Working with the industry, and in this case, a lot of the Java providers out there to look for their versions of the patch for anything that was third party. But we have a lot of implementations internally that we can also patch ourselves. That's the nice thing about having an engineering culture, and engineers. You don't have to wait for anybody else. You look at it yourself and figure out, "Well, we can do these things."

So, from the engineering side of the house, a few of the teams really dug really deep, figured out two ways. Number one, an internal patch, if you will, software patch that would be applied to the many millions of systems that we have. But I think the biggest and most innovative thing that the team came up with is the ability to hot patch. Essentially, a hot patch is a changing, essentially, of the Java configuration in a very automated fashion to allow us to turn off the thing that was broken that caused the vulnerability. In this case, the print buffer capability that was in the software. And then the ability to push that out broadly.

To get it as broad as we wanted it, this is where we kind of went off script from what traditionally we would do, because service teams own the patching of all their systems, security just advises and assists. But going back to the enablement perspective, one of the things we did, we had a means by which to get to all of those systems, with their blessing, to be able to push the hot patch, which we knew was a low risk, versus the risk that you would face from having the vulnerability, and we're able to do the hot patch.

Why was the ability to hot patch such an important innovation?

Digital experiences that build customer confidence

CJ Moses (07:55):
Hot patch was created in a very short period of time. Of course, didn't go through our normal long process of all the vulnerability management and stuff like that, or all the AppSec review type stuff that we would normally do. But one of the things was they took our state from needing to patch everything immediately, software patch everything immediately, to having everything patched virtually in a very short period of time, which bought us time to continue to patch everything else.

A hot patch for us is a defense-in-depth mechanism — it helps you to buy time in order to be able to patch all the other things that we have in due time. That's part of that creation of defense-in-depth that we believe in. Obviously, we had other mitigations, WAF and other types of things that we’re blocking and tackling ahead of time. But having that allowed us the time to make sure everything else was patched securely. As you know with Log4j, it wasn't one patch. Then a few days later they realized, "Oh, there's another issue." We actually helped identify some of these additional things. There were multiple patches that needed to be done. That allowed us to not have to physically patch all of these things a bunch of times. Because we had the hot patch that, for the first three, I believe it is, three issues that popped up, we were able to use the hot patch to protect ourselves against, while then we were able to push the longer tail. Even then, we did so at a high rate of speed for many millions of systems and be able to do that.

The cooperation isn't out of the norm for us to work together with engineering teams. That's the thing, is day to day, that's what we do. We're part of their teams. We have security guardians that are part of the service teams themselves. They're trained for a few weeks by us to understand a lot of the normal security requirements. Also become ambassadors into the service teams. And those security guardians then allow us to operate as one, so that it's not an “us and them.”

The adversarial thing that sometimes appears when you have a security team, normally comes out of a security team that is the land of “no,” — always trying to block and tackle and keep you from doing things. We want to be the inverse of that, and have been, to where we're enabling them. In this case, that was some of the benefit that arose from that is we worked together with the engineering team. They came up with the hot patch. We came up with the means by which to deploy it broadly, very quickly to our scale, then buying us all time. When I say “buying us time,” we all worked through the holidays making sure everything was patched –

Clarke Rodgers (10:24):
Sure.

CJ Moses (10:25):
– as our security bulletins advised on. That's the thing is, a lot of our messaging going out wasn't messaging that we had hot patched everything, we actually messaged when we software patched. Because the thing that you don't want to do is have a software ... or a hot patch, and then not software patch it. And then have something go wrong with the one layer of defense, the hot patch, and then the vulnerability that you originally were trying to protect against is still there. So, hot patch is a defense-in-depth mechanism to block and tackle while you do the other.

We were reporting on when we got software patching done. That way it kept us, A, transparent to our customers, so they understood what was going on. And B, that we were transparent to ourselves. That we weren't fooling ourselves that we actually mitigated the threat completely by doing the hot patch, then software patching. That's when we called done, done, is when software patch was broadly deployed and everything was in good stead.

Quite a challenge. We learned so much from it. Like I was saying, it was 1.2 million actions by 17,000 engineers. We talked about security being the path of least resistance — having 17,000 engineers working at breakneck speed is not least friction. Following on behind that, we've worked very closely to try to make sure that we have automation across the totality of all the different platform ... or different networks that we run on, to be able to patch things in a timely fashion. Not through the holidays with people having to make heroic efforts. I think that was a heroic effort. Every time we have to have a heroic effort, we have to figure out a way to make it normal course of business, during normal business hours for people around the world, so that we don't have to have that.

Communicating With the C-suite and stakeholders during Log4J

Digital experiences that build customer confidence

Clarke Rodgers (12:12):
To bring it back to our earlier conversation with the weekly CISO, CEO meeting, right? Log4j was out of the norm, right? Because you're patching all the time for everything else that comes out. But this was different, and had to dedicate a whole bunch of resources. Did you wait for the Friday meeting to discuss this?

CJ Moses (12:33):
Absolutely not.

Clarke Rodgers (12:34):
How often was the-

CJ Moses (12:35):
Three times a day.

Clarke Rodgers (12:37):
... the CEO and the CISO, or representatives from each, were-

CJ Moses (12:39):
Yeah, we had regular meetings at different layers that were going on all the time. But three times a day, at a minimum, we were doing updates in the live. Or in meetings, virtual meetings, three times a day. With good questions and prioritizations and pushes, and, "Why is it taking this long for this team?" Normally there was a good answer within the room because of the focus by everyone on it. It was pretty much a "drop everything else and get that done" type of thing.

Clarke Rodgers (13:06):
That's fantastic.

CJ Moses (13:08):
It was understood. That's the nice thing is, it was understood, once we explained what needed to happen and what the risk was, there wasn't any pushback across the totality of the business. Everybody was like, "Got it. We know what we need to do. This is it. This is what we're going to focus on. We're going to get it done." So, there wasn't that fight. And it wasn't because Adam or somebody was saying we had to do it. Before it even got to Adam, everybody was already doing the things.

The first update to Adam after giving him the heads up, that, "Hey, this is what we've got going down," was one that, "Here's the plan. This is what we're doing. We're going to iterate on it, may get better, but this is what we're doing for now." Everyone was all in. So, that mindset. Again, the Friday meeting isn't the, "There's an emergent thing I need to tell you about now." That's the nice thing, is that our businesses circles around the leadership, single threaded leaders, actually interacting.

So, Adam knows about what's going on. He had questions, we had answers. It was one of those discussions like that. And if something pops up, he'll get a text message or a Chime or a Slack, or one of our many messaging different capabilities — Wickr, primarily, these days — to say, "Hey, this is what's going on. Just a heads up so you don't get blindsided. More details to come." Whether it be a meeting or an impromptu meeting that we call, like we had for Log4j, on an ongoing basis. That meeting didn't end until we were patched, software patched. It wasn't three times a day, through all the time, but it was continued to make sure we were done, done.

Don’t wait for a zero day — Earn trust with your stakeholders now

The path to greater conversions

Clarke Rodgers (14:42):
So, it sounds like it's extremely important to have those business and, actually, personal relationships forged well before anything like that happens, right?

CJ Moses (14:51):
Oh, absolutely. If you're cold calling-

Clarke Rodgers (14:53):
And earning the trust, et cetera.

CJ Moses (14:55):
Yeah, if you're cold calling either your executive team or even, in our case, service team leaders on things like this, you're way behind the curve. That's the thing is trust is normally built between humans, therefore, it takes time. Normally it's done mostly face to face, which has been a challenge over these last few years. But in my case, I'm lucky to have been here for coming up on 15 years here.

So, you get to know people and they get to know you. Because everyone has a different user interface. Different people have different ways of interacting. So, understanding how they like to receive information also is helpful. I think Adam and I have worked that out, so I know what he likes to see

So, there's a lot of things that are challenging, but in the end, it really comes down to ... The biggest advice that I give to any CISO or candidate for CISO is understand that you as the chief security representative for the company — regardless of the title, if it's Chief Information Security Officer, Chief Security Officer.

The reality is that if you're Chief Information Security Officer, you have to be not only responsible or an advocate for all security — doesn't matter what type, personnel, physical security. Just because it's not in your title or in the name doesn't mean it's not your responsibility. Because in the end, if the data is somehow leaked or otherwise breached, or whatever, it's still partially your fault. And I say partially because of the shared model between the service owners, in our case, and the security team. But at the same time, when it all comes down to it, it's my responsibility, and those business owners, to make sure that we're maintaining the security across the platform, across our environments. It's not one we take lightly, that's for sure.

Clarke Rodgers (16:47):
Awesome. Well, CJ, thank you so much for your time today. I really appreciate your insights.

CJ Moses (16:50):
No, thank you.

The path to greater conversions

About the leaders

CJ Moses, AWS Chief Information Security Officer and Vice President of Security Engineering

CJ Moses
AWS Chief Information Security Officer and Vice President of Security Engineering

In his role, CJ leads secure product design, management, and development efforts focused on bringing the competitive, economic, and security benefits of cloud computing to customers. Prior to joining Amazon in 2007, CJ led the technical analysis of computer and network intrusion efforts at the U.S. Federal Bureau of Investigation Cyber Division. CJ also served as a Special Agent with the U.S. Air Force Office of Special Investigations (AFOSI). CJ led several computer intrusion investigations seen as foundational to the information security industry today.

Clarke Rodgers
AWS Enterprise Strategist

As an AWS Enterprise Security Strategist, Clarke is passionate about helping executives explore how the cloud can transform security and working with them to find the right enterprise solutions. Clarke joined AWS in 2016, but his experience with the advantages of AWS security started well before he became part of the team. In his role as CISO for a multinational life reinsurance provider, he oversaw a strategic division’s all-in migration to AWS.

Related reading

Sort By:

We could not find any results that match your search. Please try a different search.

Take the next step

PODCAST

Listen and Learn

Listen to executive leaders and AWS Enterprise Strategists, all former C-Suite, discuss their digital transformation journeys.

Subscribe to the podcast

LinkedIn

Stay Connected

AWS Executive Connection is a digital destination for business and technology leaders where we share information.

Follow us on LinkedIn

EXECUTIVE EVENTS

Watch on Demand

Get insights from peers and discover new ways to power your digital transformation journey through this exclusive international network.

Join ExecLeaders

C-suite conversations

Get Inspired

Listen in as AWS and customer leaders discuss best practices, lessons, and transformative thinking.

Join the conversation