AWS Security Blog

How to Query Personally Identifiable Information with Amazon Macie

Amazon Macie logo

In August 2017 at the AWS Summit New York, AWS launched a new security and compliance service called Amazon Macie. Macie uses machine learning to automatically discover, classify, and protect sensitive data in AWS. In this blog post, I demonstrate how you can use Macie to help enable compliance with applicable regulations, starting with data retention.

How to query retained PII with Macie

Data retention and mandatory data deletion are common topics across compliance frameworks, so knowing what is stored and how long it has been or needs to be stored is of critical importance. For example, you can use Macie for Payment Card Industry Data Security Standard (PCI DSS) 3.2, requirement 3, “Protect stored cardholder data,” which mandates a “quarterly process for identifying and securely deleting stored cardholder data that exceeds defined retention.” You also can use Macie for ISO 27017 requirement 12.3.1, which calls for “retention periods for backup data.” In each of these cases, you can use Macie’s built-in queries to identify the age of data in your Amazon S3 buckets and to help meet your compliance needs.

To get started with Macie and run your first queries of personally identifiable information (PII) and sensitive data, follow the initial setup as described in the launch post on the AWS Blog. After you have set up Macie, walk through the following steps to start running queries. Start by focusing on the S3 buckets that you want to inventory and capture important compliance related activity and data.

To start running Macie queries:

  1. In the AWS Management Console, launch the Macie console (you can type macie to find the console).
  2. Click Dashboard in the navigation pane. This shows you an overview of the risk level and data classification type of all inventoried S3 buckets, categorized by date and type.
    Screenshot of "Dashboard" in the navigation pane
  3. Choose S3 objects by PII priority. This dashboard lets you sort by PII priority and PII types.
    Screenshot of "S3 objects by PII priority"
  4. In this case, I want to find information about credit card numbers. I choose the magnifying glass for the type cc_number (note that PII types can be used for custom queries). This view shows the events where PII classified data has been uploaded to S3. When I scroll down, I see the individual files that have been identified.
    Screenshot showing the events where PII classified data has been uploaded to S3
  5. Before looking at the files, I want to continue to build the query by only showing items with high priority. To do so, I choose the row called Object PII Priority and then the magnifying glass icon next to High.
    Screenshot of refining the query for high priority events
  6. To view the results matching these queries, I scroll down and choose any file listed. This shows vital information such as creation date, location, and object access control list (ACL).
  7. The piece I am most interested in this case is the Object PII details line to understand more about what was found in the file. In this case, I see name and credit card information, which is what caused the high priority. Scrolling up again, I also see that the query fields have updated as I interacted with the UI.
    Screenshot showing "Object PII details"

Let’s say that I want to get an alert every time Macie finds new data matching this query. This alert can be used to automate response actions by using AWS Lambda and Amazon CloudWatch Events.

  1. I choose the left green icon called Save query as alert.
    Screenshot of "Save query as alert" button
  2. I can customize the alert and change things like category or severity to fit my needs based on the alert data.
  3. Another way to find the information I am looking for is to run custom queries. To start using custom queries, I choose Research in the navigation pane.
    1. To learn more about custom Macie queries and what you can do on the Research tab, see Using the Macie Research Tab.
  4. I change the type of query I want to run from CloudTrail data to S3 objects in the drop-down list menu.
    Screenshot of choosing "S3 objects" from the drop-down list menu
  5. Because I want PII data, I start typing in the query box, which has an autocomplete feature. I choose the pii_types: query. I can now type the data I want to look for. In this case, I want to see all files matching the credit card filter so I type cc_number and press Enter. The query box now says, pii_types:cc_number. I press Enter again to enable autocomplete, and then I type AND pii_types:email to require both a credit card number and email address in a single object.
    The query looks for all files matching the credit card filter ("cc_number")
  6. I choose the magnifying glass to search and Macie shows me all S3 objects that are tagged as PII of type Credit Cards. I can further specify that I only want to see PII of type Credit Card that are classified as High priority by adding AND and pii_impact:high to the query.
    Screenshot showing narrowing the query results furtherAs before, I can save this new query as an alert by clicking Save query as alert, which will be triggered by data matching the query going forward.

Advanced tip

Try the following advanced queries using Lucene query syntax and save the queries as alerts in Macie.

  • Use a regular-expression based query to search for a minimum of 10 credit card numbers and 10 email addresses in a single object:
    • pii_explain.cc_number:/([1-9][0-9]|[0-9]{3,}) distinct Credit Card Numbers.*/ AND[1-9][0-9]|[0-9]{3,}) distinct Email Addresses.*/
  • Search for objects containing at least one credit card, name, and email address that have an object policy enabling global access (searching for S3 AllUsers or AuthenticatedUsers permissions):
    • (object_acl.Grants.Grantee.URI:”http\://” OR  object_acl.Grants.Grantee.URI:”http\://”) AND (pii_types.cc_number AND AND

These are two ways to identify and be alerted about PII by using Macie. In a similar way, you can create custom alerts for various AWS CloudTrail events by choosing a different data set on which to run the queries again. In the examples in this post, I identified credit cards stored in plain text (all data in this post is example data only), determined how long they had been stored in S3 by viewing the result details, and set up alerts to notify or trigger actions on new sensitive data being stored. With queries like these, you can build a reliable data validation program.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about how to use Macie, start a new thread on the Macie forum or contact AWS Support.