AWS Architecture Blog

BBVA: Architecture for Large-Scale Macie Implementation

This post was co-written by Andrew Alaniz , Director of Technology Infrastructure, and Brady Pratt, Cloud Security Engineer, both at BBVA USA.

Introduction

Data Loss Prevention (DLP) is a common topic among companies that work with any type of sensitive data. One of the challenges is that many people either don’t fully understand what DLP is, or rather, have their own definition of what it is. Regardless of one’s interpretation of DLP, one thing is certain: before you can control data loss, you need to find the data sources.

If an organization can’t identify its data, it can’t protect it. BBVA USA, a bank holding company, turned to AWS for advice, and decided to use Amazon Macie to accomplish this in Amazon Simple Storage Service (Amazon S3). Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS. This blog post will share some of the design and architecture we used to deploy Macie using services such as AWS Lambda and Amazon CloudWatch.

Data challenges in Amazon S3

Although all S3 buckets are private by default, everyone is aware of the challenges that can result if sensitive data in S3 buckets are exposed publicly. Amazon has provided a way to prevent that by removing the ability to make buckets public. With Macie, one can classify the data stored in S3, centrally, and through AWS Organizations.

Recommended architecture

We can break the Macie architecture into two main parts: S3 discovery and evaluation, and S3 sensitive data discovery:

Macie architecture

The setup of discovery and evaluation is simple and straightforward, and should be enabled through Amazon Organizations and across all accounts. The cost of this piece is minimal, and it provides valuable insights into the compliance state of S3 buckets.

Once setup of discovery and evaluation is completed, we are ready to move to the next step: configuring discovery jobs for our S3 buckets. The architecture includes the use of S3, Amazon CloudWatch Events, Amazon EventBridge, and Lambda. All of the execution should happen in a centralized account, but the event triggers should come from each individual account.

Architectural considerations

When determining the architectural design of the solution, consider a few main components:

Centralization

Utilize AWS Organizations: Macie allows native integration with AWS Organizations. This is a significant advantage for Macie. Additionally, within AWS Organizations, it allows the delegation of the Macie master account to a subordinate account. The benefit of this is that it allows centralized management while allowing for the compartmentalization of roles.

Ease of management

One of the most challenging things to manage is non-conforming configurations. It’s much easier to manage a standard way to create, name, and configure settings. Once we were ready to create the classification jobs, we had to take into consideration the following when deploying Macie for our use case:

  1. Macie classifies content in a single job across one account.
  2. If you submit multiple jobs that contain the same bucket, Macie will scan the objects multiple times.
  3. Macie jobs are immutable.

Due to these considerations, we decided to create one job per S3 bucket. This allows administrators to search more easily for jobs related to findings.

Cost considerations

Macie plays an essential role, not only in identifying data and improving data collection, but also in compliance. We needed to make a decision about how to determine if an S3 bucket would be included in a classification job. Initially, we considered including all buckets no matter what. The logic here was that even if we assumed that a bucket would never have sensitive data in it, an entity with the right role could always add something later.

Finally, we implemented a solution to tag specific buckets known to have immutable properties and which would never allow sensitive data to be added. We could do this because we knew exactly what data was in the bucket, who or what created the bucket, and exactly who or what had access to the bucket.

An example of this type of bucket is the S3 bucket used to store VPC Flow Logs. We know that this bucket is only created by provisioning scripts and is only going to store VPC flow logs that contain no sensitive data based on data classification standards. Also, only VPC services and specific security services can access this bucket for anything other than READ. This is controlled organizationally and can be tagged with a simple ignore key/value pair upon creation.

Deploying Macie at BBVA USA

BBVA USA developed an approach to working within AWS that allows guardrails to be applied as accounts are created. One of those guardrails identifies if developers have stored sensitive data in an account. BBVA needed to be able to do this, and do it at scale. If there is a roadblock or a challenge with AWS services, the first place BBVA looks is to support, but the second place is the Technical Account Manager.

After initiating conversations with its account team, BBVA determined that AWS Macie was the tool to help them with this challenge.

With the help of its technical account manager (TAM), BBVA was able to meet with the Macie Product team and discuss the best options for deploying at scale. Through these conversations, they were even able to influence the Macie product roadmap.

Getting Macie ready to deploy at scale was actually quite simple once we designed the architectural pattern.

Initial job creation

In order to set up jobs for each existing bucket in the organization, it’s a matter of scripting the job creation and adding each bucket from each account into its own job, which is straightforward.

Job creation for new buckets

The recommended architecture and implementation for existing buckets:

  1. Whenever a new S3 bucket is added to Organization accounts, trigger a CloudWatch Event in the target account.
  2. Set up a cross account EventBridge to consume the Event. Using the EventBridge allows for a simpler configuration and centralized management of both Events and Lambda.
  3. Trigger a Lambda function in a delegated Macie admin account, which creates classification jobs to apply Macie to all the newly created S3 buckets.
  4. Repeat the same process when a bucket is deleted by triggering a cancel job.

Evaluate the state of S3 buckets

To evaluate the S3 accounts, turn on Macie at the organization master account and delegate administration to a subordinate account used for Macie. This enables management consolidation of security features into a centralized security account. This helps further restrict access from those that may need access to the master billing account. Finally, enable Macie by default on all organization accounts.

Evaluate the state of S3 buckets

Conclusion

BBVA USA worked directly with the Macie product team by leveraging its relationship with the AWS account team and Enterprise Support. This allowed the company to eventually deploy Macie quickly and at scale. Through Macie, the company is able to track any changes to configurations on buckets that allow a bucket to be public, shared, or replicated with external accounts and if the encryption policies are disabled. Using Macie, BBVA was able to identify buckets that contained sensitive information and put in another control to bolster its AWS governance profile.

Neel Sendas

Neel Sendas

Neel Sendas is a Senior Technical Account Manager at Amazon Web Services. Neel works with enterprise customers to design, deploy, and scale cloud applications to achieve their business goals. He is also a machine learning enthusiast. When he is not helping customers, he dabbles in golf and salsa dance.