AWS Security Blog

How to use Amazon Macie to preview sensitive data in S3 buckets

February 13, 2024: We’ve updated this post to show you how to configure Macie to assume an IAM role when you configure Macie to preview sensitive data in findings.


Security teams use Amazon Macie to discover and protect sensitive data, such as names, payment card data, and AWS credentials, in Amazon Simple Storage Service (Amazon S3). When Macie discovers sensitive data, these teams will want to see examples of the actual sensitive data found. Reviewing a sampling of the discovered data helps them quickly confirm that the object is truly sensitive according to their data protection and privacy policies.

In this post, we walk you through how your data security teams are able to use a new capability in Amazon Macie to retrieve up to 10 examples of sensitive data found in your S3 objects, so that you are able to confirm the nature of the data at a glance. Additionally, we will discuss how you are able to control who is able to use this capability, so that only authorized personnel have permissions to view these examples.

The challenge customers face

After a Macie sensitive data discovery job is run, security teams start their work. The security team will review the Macie findings to investigate the discovered sensitive data and decide what actions to take to protect such data. The findings provide details that include the severity of the finding, information on the affected S3 object, and a summary of the type, location, and amount of sensitive data found. However, Macie findings only contain pointers to data that Macie found in the object. In order to complete their investigation, customers in the past had to do additional work to extract the contents of a sensitive object, such as navigating to a different AWS account where the object is located, downloading and manually searching for keywords in a file editor, or writing and refining SQL queries by using Amazon S3 Select. The investigations are further slowed down when the object type is one that is not easily readable without additional tooling, such as big-data file types like Avro and Parquet. By using the Macie capability to retrieve sensitive data samples, you are able to review the discovered data and make decisions concerning the finding remediation.

Prerequisites

To implement the ability to retrieve and reveal samples of sensitive data, you’ll need the following prerequisites:

  • Enable Amazon Macie in your AWS account. For instructions, see Getting started with Amazon Macie.
  • Set your account as the delegated Macie administrator account and enable Macie in at least one member account by using AWS Organizations. In this post, we will refer to the delegated administrator account as Account A and the member account as Account B.
  • Configure Macie detailed classification results in Account A.

    Note: The detailed classification results contain a record for each Amazon S3 object that you configure the job to analyze. Each record includes the location of up to 1,000 occurrences of each type of sensitive data that Macie found in an object. Macie uses the location information in the detailed classification results to retrieve the examples of sensitive data. The detailed classification results are stored in an S3 bucket of your choice. In this post, we will refer to this bucket as DOC-EXAMPLE-BUCKET1.

  • Create an S3 bucket that contains sensitive data in Account B. In this post, we will refer to this bucket as DOC-EXAMPLE-BUCKET2.

    Note: You should enable server-side encryption on this bucket by using customer managed AWS Key Management Service (AWS KMS) keys (a type of encryption known as SSE-KMS).

  • (Optional) Add sensitive data to DOC-EXAMPLE-BUCKET2. This post uses a sample dataset that contains fake sensitive data. You are able to download this sample dataset, unarchive the .zip folder, and follow these steps to upload the objects to S3. This is a synthetic dataset generated by AWS that we will use for the examples in this post. All data in this blog post has been artificially created by AWS for demonstration purposes and has not been collected from any individual person.
  • Create and run a sensitive data discovery job from Account A to analyze the contents of DOC-EXAMPLE-BUCKET2.
  • (Optional) Set up the AWS Command Line Interface (AWS CLI).

Configure Macie to retrieve and reveal examples of sensitive data

In this section, we’ll describe how to configure Macie so that you are able to retrieve and view examples of sensitive data from Macie findings.

To configure Macie (console)

  • In the AWS Management Console, in the Macie delegated administrator account (Account A), follow these steps from the Amazon Macie User Guide.

To configure Macie (AWS CLI)

  1. Confirm that you have Macie enabled.
    	$ aws macie2 get-macie-session --query 'status'
    	// The expected response is "ENABLED"
  2. Confirm that you have configured the detailed classification results bucket.
    	$ aws macie2 get-classification-export-configuration
    
    	// The expected response is:
    	{
       	 "configuration": {
       		 	    "s3Destination": {
            		    "bucketName": " DOC-EXAMPLE-BUCKET1 ",
               			"kmsKeyArn": "arn:aws:kms:<YOUR-REGION>:<YOUR-ACCOUNT-ID>:key/<KEY-USED-TO-ENCRYPT-DOC-EXAMPLE-BUCKET1>"
         		  	 }
    		}	
    	}
  3. Create a new single-Region, symmetric KMS key to encrypt the retrieved examples of sensitive data. Make sure that the key is created in the same AWS Region where you are operating Macie.
    $ aws kms create-key
    {
        "KeyMetadata": {
            "Origin": "AWS_KMS",
            "KeyId": "<YOUR-KEY-ID>",
            "Description": "",
            "KeyManager": "CUSTOMER",
            "Enabled": true,
            "KeySpec": "SYMMETRIC_DEFAULT",
            "CustomerMasterKeySpec": "SYMMETRIC_DEFAULT",
            "KeyUsage": "ENCRYPT_DECRYPT",
            "KeyState": "Enabled",
            "CreationDate": 1502910355.475,
            "Arn": "arn:aws:kms: <YOUR-AWS-REGION>:<AWS-ACCOUNT-A>:key/<YOUR-KEY-ID>",
            "AWSAccountId": "<AWS-ACCOUNT-A>",
            "MultiRegion": false
            "EncryptionAlgorithms": [
                "SYMMETRIC_DEFAULT"
            ],
        }
    }
  4. Give this key the alias REVEAL-KMS-KEY.
    $ aws kms CreateAlias
    {
       "AliasName": "REVEAL-KMS-KEY",
       "TargetKeyId": "<YOUR-KEY-ID>"
    }

Configure access to read sensitive data samples with Amazon Macie

This Macie capability uses cross-account access or role chaining to assume a role in the target or member account. When the request is made through the Macie console or Macie API, the service principal, reveal-samples.macie.amazonaws.com, will assume a role in the member account that was specified during the setup of the Macie reveal function. You must create this target role in the member account with appropriate permissions to access the buckets through AWS Identity and Access Management (IAM) policies. You can achieve this by using an AWS CloudFormation template provided in the console during the setup; this CloudFormation template grants the s3:GetObject permission for all buckets in the member account. You can customize the policy and specify which S3 buckets Macie will have access to if you don’t want to allow access to all buckets in a member account. Customers that use AWS KMS keys for server-side encryption (SSE-KMS) of objects must also add kms:Decrypt for the role created to the keys used for S3 object encryption. Following is an example of the policy you must add to the key. Remember to insert the correct role name into your policy.

{
    "Sid": "Allow customer role to access the key",
    "Effect": "Allow",
    "Principal": {
        "AWS": " arn:aws:iam::*:role/{<ROLE-NAME-SET-UP-IN-MACIE-REVEAL>}"
    },
    "Action": [ "kms:Decrypt" ],
    "Resource": "*"
}

We’ll now walk through the steps for how to grant and control access to the Macie ability to retrieve and reveal sensitive data examples.

Before configuring the Macie reveal function, you must create a role in your delegated administrator account. For this example, we chose the name MacieReveal. Assign the following permissions and trusts to the role.

  1. The first step is to create an IAM policy in the delegated administrator account. This policy allows role in the delegated administrator to assume a role in the member account and also provides for IAM permissions to allow the role access to S3 buckets in the member accounts. For the purposes of this post, we allow access to all S3 buckets in all accounts. You can customize the policy to meet your organization’s security requirements by altering the Resource statement in the policy. We named the following example policy MacieRevealPolicy; be sure to substitute your preferred role name in the policy.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "S3Access",
                "Effect": "Allow",
                "Action":     "s3:GetObject",
                "Resource": [
                    "*"
                ]
            },
            {
                "Sid": "CrossAccountRoleAccess",
                "Effect": "Allow",
                "Action": "sts:AssumeRole",
                "Resource": "arn:aws:iam::*:role/{<ROLE-NAME-SET-UP-IN-MACIE-REVEAL>}"
            }
        ]
    }
  2. Second, you must create a role with the following custom trust policy, then assign the policy you created above to the newly created role. For this post, we named the role MacieReveal. This trust policy allows the Macie service-linked role called reveal-samples.macie.amazonaws.com to assume a role (sts:AssumeRole) only from the delegated administration account (aws:SourceAccount) specified in the policy. We will create the role in the member accounts after configuring Macie by using an auto-generated CloudFormation template.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "Statement1",
                "Effect": "Allow",
                "Principal": {
                    "Service": "reveal-samples.macie.amazonaws.com"
                },
                "Action": [
                    "sts:AssumeRole"
                ],
                "Condition": {
                    "StringEquals": {
                        "aws:SourceAccount": ["{<ADMIN-ACCOUNT_ID>}"]
                    }
                }
            }
        ]
    }

    Take note of the role name you chose, because this will be needed during the setup of the Macie reveal functionality.

After you have created the prerequisite policy and role, you’re ready to begin configuring the reveal functionality in Macie.

To configure reveal functionality in Macie

  1. In the Macie console, in the left navigation pane, select the Reveal samples option.
  2. Choose the Edit button. Under Settings, select Enable and save your selection.
    Figure 1: Enable Macie Reveal functionality

    Figure 1: Enable Macie Reveal functionality

  3. Complete the Settings wizard, specifying the role name and KMS key you created earlier.
    Figure 2: Macie Reveal wizard

    Figure 2: Macie Reveal wizard

  4. Save your configuration. After you have saved your configuration, Macie will generate an external ID to identify the delegated administrator account. The external ID is included in the automatically generated AWS CloudFormation template available for download from the console. If you choose to not use the pre-created CloudFormation template, you will need to include the external ID in any role you create in the member accounts. The primary function of the external ID is to address and prevent the confused deputy problem.

The CloudFormation template creates a role and assigns IAM permissions to that role. This is the role that will be assumed by the reveal-samples.macie.amazonaws.com service principal in the member account.

Following is an example of the CloudFormation template created in the Macie console for you to use to create the cross-account role. Take note of the inclusion of the generated external ID in the CloudFormation template.

{
    "AWSTemplateFormatVersion": "2010-09-09",
    "Resources": {
        "revealRetrievalMode": {
            "Type": "AWS::IAM::Role",
            "Properties": {
                "RoleName": "MacieReveal",
                "AssumeRolePolicyDocument": {
                    "Version": "2012-10-17",
                    "Statement": [
                        {
                            "Sid": "AllowMacieAdminRevealRoleForCrossAccountAccess",
                            "Effect": "Allow",
                            "Principal": {
                                "AWS": "arn:aws:iam::123456789012:role/MacieReveal"
                            },
                            "Action": "sts:AssumeRole",
                            "Condition": {
                                "StringEquals": {
                                    "sts:ExternalId": "1234567890EXAMPLE"
                                }
                            }
                        }
                    ]
                },
                "Policies": [
                    {
                        "PolicyName": "S3BucketAccess",
                        "PolicyDocument": {
                            "Version": "2012-10-17",
                            "Statement": [
                                {
                                    "Sid": "RetrieveS3Objects",
                                    "Effect": "Allow",
                                    "Action": [
                                        "s3:GetObject"
                                    ],
                                    "Resource": [
                                        "*"
                                    ]
                                }
                            ]
                        }
                    }
                ]
            }
        }
    }
}

The console also provides a link for downloading the CloudFormation template. This download will automatically include the provided External ID.

Figure 3: Download the CloudFormation template

Figure 3: Download the CloudFormation template

This CloudFormation template can be used as a stack set to deploy the correct role across your organization in AWS Organizations. After you’ve downloaded and deployed the CloudFormation template to all your member accounts, you’re ready to use the Macie reveal samples functionality. Consider only deploying the CloudFormation stack set to the accounts that have Macie enabled.

Retrieve and reveal sensitive data samples

Now that you’ve put in place the necessary permissions, users who assume MACIE-REVEAL-ROLE will be able to conveniently retrieve and reveal sensitive data samples.

To retrieve and reveal sensitive data samples

  1. In the Macie console in Account A, in the left navigation pane, choose Findings, and select a specific finding. Under Sensitive Data, choose Review.
    Figure 4: The finding details panel

    Figure 4: The finding details panel

  2. On the Reveal sensitive data page, choose Reveal samples.
    Figure 5: The Reveal sensitive data page

    Figure 5: The Reveal sensitive data page

  3. Under Sensitive data, you will be able to view up to 10 examples of the sensitive data found by Amazon Macie, including the sample data from DOC-EXAMPLE-BUCKET2 in Account B, if you put the data there.
    Figure 6: Examples of sensitive data revealed in the Amazon Macie console

    Figure 6: Examples of sensitive data revealed in the Amazon Macie console

You are able to find additional information on setting up the Macie Reveal function in the Amazon Macie User Guide.

Conclusion

In this post, we showed how you are to retrieve and review examples of sensitive data that were found in Amazon S3 using Amazon Macie. This capability will make it simpler for your data protection teams to review the sensitive contents found in S3 buckets across the accounts in your AWS environment. With this information, security teams are able to quickly take remediation actions, such as updating the configuration of sensitive buckets, quarantining files with sensitive information, or sending a notification to the owner of the account where the sensitive data resides. In certain cases, you are able to add the examples to an allow list in Macie if you don’t want Macie to report those as sensitive data (for example, corporate addresses or sample data that is used for testing).

The following are links to additional resources that you will be able to use to expand your knowledge of Amazon Macie capabilities and features:

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Macie re:Post.

Want more AWS Security news? Follow us on Twitter.

Koulick Ghosh

Koulick Ghosh

Koulick is a Senior Product Manager in AWS Security based in Seattle, WA. He loves speaking with customers on how AWS Security services can help make them more secure. In his free-time, he enjoys playing the guitar, reading, and exploring the Pacific Northwest.

Author

Michael Ingoldby

Michael is a Senior Security Solutions Architect at AWS based in Frisco, Texas. He provides guidance and helps customers to implement AWS native security services. Michael has been working in the security domain since 2006. When he is not working, he enjoys spending time outdoors.

Robert Wu

Robert Wu

Robert is the Software Development Engineer for AWS Macie, working on enabling customers with more sensitive data discovery capabilities. In his free time, he enjoys exploring and contributing to various open-source projects to widen his domain knowledge.