AWS Security Blog
How to use Amazon Macie to preview sensitive data in S3 buckets
Security teams use Amazon Macie to discover and protect sensitive data, such as names, payment card data, and AWS credentials, in Amazon Simple Storage Service (Amazon S3). When Macie discovers sensitive data, these teams will want to see examples of the actual sensitive data found. Reviewing a sampling of the discovered data helps them quickly confirm that the object is truly sensitive according to their data protection and privacy policies.
In this post, we walk you through how your data security teams are able to use a new capability in Amazon Macie to retrieve up to 10 examples of sensitive data found in your S3 objects, so that you are able to confirm the nature of the data at a glance. Additionally, we will discuss how you are able to control who is able to use this capability, so that only authorized personnel have permissions to view these examples.
The challenge customers face
After a Macie sensitive data discovery job is run, security teams start their work. The security team will review the Macie findings to investigate the discovered sensitive data and decide what actions to take to protect such data. The findings provide details that include the severity of the finding, information on the affected S3 object, and a summary of the type, location, and amount of sensitive data found. However, Macie findings only contain pointers to data that Macie found in the object. In order to complete their investigation, customers in the past had to do additional work to extract the contents of a sensitive object, such as navigating to a different AWS account where the object is located, downloading and manually searching for keywords in a file editor, or writing and refining SQL queries by using Amazon S3 Select. The investigations are further slowed down when the object type is one that is not easily readable without additional tooling, such as big-data file types like Avro and Parquet. By using the Macie capability to retrieve sensitive data samples, you are able to review the discovered data and make decisions concerning the finding remediation.
To implement the ability to retrieve and reveal samples of sensitive data, you’ll need the following prerequisites:
- Enable Amazon Macie in your AWS account. For instructions, see Getting started with Amazon Macie.
- Set your account as the delegated Macie administrator account and enable Macie in at least one member account by using AWS Organizations. In this post, we will refer to the delegated administrator account as Account A and the member account as Account B.
- Configure Macie detailed classification results in Account A.
Note: The detailed classification results contain a record for each Amazon S3 object that you configure the job to analyze, and include the location of up to 1,000 occurrences of each type of sensitive data that Macie found in an object. Macie uses the location information in the detailed classification results to retrieve the examples of sensitive data. The detailed classification results are stored in an S3 bucket of your choice. In this post, we will refer to this bucket as DOC-EXAMPLE-BUCKET1.
- Create an S3 bucket that contains sensitive data in Account B. In this post, we will refer to this bucket as DOC-EXAMPLE-BUCKET2.
Note: You should enable server-side encryption on this bucket by using customer managed AWS Key Management Service (AWS KMS) keys (a type of encryption known as SSE-KMS).
- (Optional) Add sensitive data to DOC-EXAMPLE-BUCKET2. This post uses a sample dataset that contains fake sensitive data. You are able to download this sample dataset, unarchive the .zip folder, and follow these steps to upload the objects to S3. This is a synthetic dataset generated by AWS that we will use for the examples in this post. All data in this blog post has been artificially created by AWS for demonstration purposes and has not been collected from any individual person. Similarly, such data does not relate back to any individual person, nor is it intended to.
- Create and run a sensitive data discovery job from Account A to analyze the contents of DOC-EXAMPLE-BUCKET2.
- (Optional) Set up the AWS Command Line Interface (AWS CLI).
Configure Macie to retrieve and reveal examples of sensitive data
In this section, we’ll describe how to configure Macie so that you are able to retrieve and view examples of sensitive data from Macie findings.
To configure Macie (console)
- In the AWS Management Console, in the Macie delegated administrator account (Account A), follow these steps from the Amazon Macie User Guide.
To configure Macie (AWS CLI)
- Confirm that you have Macie enabled.
- Confirm that you have configured the detailed classification results bucket.
- Create a new KMS key to encrypt the retrieved examples of sensitive data. Make sure that the key is created in the same AWS Region where you are operating Macie.
- Give this key the alias REVEAL-KMS-KEY.
- Enable the feature in Macie and configure it to encrypt the data by using REVEAL-KMS-KEY. You do not specify a key policy for your new KMS key in this step. The key policy will be discussed later in the post.
Control access to read sensitive data and protect data displayed in Macie
This new Macie capability uses the AWS Identity and Access Management (IAM) policies, S3 bucket policies, and AWS KMS key policies that you have defined in your accounts. This means that in order to see examples through the Macie console or by invoking the Macie API, the IAM principal needs to have read access to the S3 object and to decrypt the object if it is server-side encrypted. It’s important to note that Macie uses the IAM permissions of the AWS principal to locate, retrieve, and reveal the samples and does not use the Macie service-linked role to perform these tasks.
Using the setup discussed in the previous section, you will walk through how to control access to the ability to retrieve and reveal sensitive data examples. To recap, you created and ran a discovery job from the Amazon Macie delegated administrator account (Account A) to analyze the contents of DOC-EXAMPLE-BUCKET2 in a member account (Account B). You configured Macie to retrieve examples and to encrypt the examples of sensitive data with the REVEAL-KMS-KEY.
The next step is to create and use an IAM role that will be assumed by other users in Account A to retrieve and reveal examples of sensitive data discovered by Macie. In this post, we’ll refer to this role as MACIE-REVEAL-ROLE.
To apply the principle of least privilege and allow only authorized personnel to view the sensitive data samples, grant the following permissions so that Macie users who assume MACIE-REVEAL-ROLE will be able to successfully retrieve and reveal examples of sensitive data:
- Step 1 – Update the IAM policy for MACIE-REVEAL-ROLE.
- Step 2 – Update the KMS key policy for REVEAL-KMS-KEY.
- Step 3 – Update the S3 bucket policy for DOC-EXAMPLE-BUCKET2 and the KMS key policy used for its server-side encryption in Account B.
After you grant these permissions, MACIE-REVEAL-ROLE is succcesfully able to retrieve and reveal examples of sensitive data in DOC-EXAMPLE-BUCKET2, as shown in Figure 1.
Step 1: Update the IAM policy
Provide the following required permissions to MACIE-REVEAL-ROLE:
- Allow GetObject from DOC-EXAMPLE-BUCKET2 in Account B.
- Allow decryption of DOC-EXAMPLE-BUCKET2 if it is server-side encrypted with a customer managed key (SSE-KMS).
- Allow GetObject from DOC-EXAMPLE-BUCKET1.
- Allow decryption of the Macie discovery results.
- Allow the necessary Macie actions to retrieve and reveal sensitive data examples.
To set up the required permissions
- Use the following commands to provide the permissions. Make sure to replace the placeholders with your own data.
Step 2: Update the KMS key policy
Next, update the KMS key policy that is used to encrypt sensitive data samples that you retrieve and reveal in your delegated administrator account.
To update the key policy
- Allow the MACIE-REVEAL-ROLE access to the KMS key that you created for protecting the retrieved sensitive data, using the following commands. Make sure to replace the placeholders with your own data.
Step 3: Update the bucket policy of the S3 bucket
Finally, update the bucket policy of the S3 bucket in member accounts, and update the key policy of the key used for SSE-KMS.
To update the S3 bucket policy and KMS key policy
- Use the following commands to update key policy for the KMS key used for server-side encryption of the DOC-EXAMPLE-BUCKET2 bucket in Account B.
- Use the following commands to update the bucket policy of DOC-EXAMPLE-BUCKET2 to allow cross-account access for MACIE-REVEAL-ROLE to get objects from this bucket.
Retrieve and reveal sensitive data samples
Now that you’ve put in place the necessary permissions, users who assume MACIE-REVEAL-ROLE will be able to conveniently retrieve and reveal sensitive data samples.
To retrieve and reveal sensitive data samples
- In the Macie console, in the left navigation pane, choose Findings, and select a specific finding. Under Sensitive Data, choose Review.
- On the Reveal sensitive data page, choose Reveal samples.
- Under Sensitive data, you will be able to view up to 10 examples of the sensitive data found by Amazon Macie.
You are able to find additional information on setting up the Macie Reveal function in the Amazon Macie User Guide.
In this post, we showed how you are to retrieve and review examples of sensitive data that were found in Amazon S3 using Amazon Macie. This capability will make it easier for your data protection teams to review the sensitive contents found in S3 buckets across the accounts in your AWS environment. With this information, security teams are able to quickly take remediation actions, such as updating the configuration of sensitive buckets, quarantining files with sensitive information, or sending a notification to the owner of the account where the sensitive data resides. In certain cases, you are able to add the examples to an allow list in Macie if you don’t want Macie to report those as sensitive data (for example, corporate addresses or sample data that is used for testing).
The following are links to additional resources that you will be able to use to expand your knowledge of Amazon Macie capabilities and features:
- Data Discovery and Classification with Amazon Macie workshop
- Discover sensitive data by using custom data identifiers with Amazon Macie
- Use Security Hub custom actions to remediate S3 resources based on Macie discovery results
- Use Macie to discover sensitive data as part of automated data pipelines
- Amazon Macie documentation
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Macie re:Post.
Want more AWS Security news? Follow us on Twitter.