AWS News Blog

Launch – Hello Amazon Macie: Automatically Discover, Classify, and Secure Content at Scale

Voiced by Polly

This blog is out of date. Please refer here for the updated info.

When Jeff and I heard about this service, we both were curious on the meaning of the name Macie. Of course, Jeff being a great researcher looked up the name Macie and found that the name Macie has two meanings. It has both French and English (UK) based origin, it is typically a girl name, has various meanings. The first meaning of Macie that was found, said that that name meant “weapon”.  The second meaning noted the name was representative of a person that is bold, sporty, and sweet. In a way, these definitions are appropriate, as today I am happy to announce that we are launching  Amazon Macie, a new security service that uses machine learning to help identify and protect sensitive data stored in AWS from breaches, data leaks, and unauthorized access with Amazon Simple Storage Service (Amazon S3) being the initial data store. Therefore, I can imagine that Amazon Macie could be described as a bold, weapon for AWS customers providing a sweet service with a sporty user interface that helps to protects against malicious access of your data at rest. Whew, that was a mouthful, but I unbelievably got all the Macie descriptions out in a single sentence! Nevertheless, I am a thrilled to share with you the power of the new Amazon Macie service.

Amazon Macie is a service powered by machine learning that can automatically discover and classify your data stored in Amazon S3. But Macie doesn’t stop there, once your data has been classified by Macie, it assigns each data item a business value, and then continuously monitors the data in order to detect any suspicious activity based upon access patterns. Key features of the Macie service include:

  • Data Security Automation: analyzes, classifies, and processes data to understand the historical patterns, user authentications to data, data access locations, and times of access.
  • Data Security & Monitoring: actively monitors usage log data for anomaly detected along with automatic resolution of reported issues through CloudWatch Events and Lambda
  • Data Visibility for Proactive Loss prevention: Provides management visibility into details of storage data while providing immediate protection without the need for manual customer input
  • Data Research and Reporting: allows administrative configuration for reporting and alert management requirements

How does Amazon Macie accomplish this you ask? 

Using machine learning algorithms for natural language processing (NLP), Macie can automate the classification of data in your S3 buckets. In addition, Amazon Macie takes advantage of predictive analytics algorithms enabling data access patterns to be dynamically analyzed. Learnings are then used to inform and to alert you on possible suspicious behavior. Macie also runs an engine specifically to detect common sources of personally identifiable information (PII), or sensitive personal information (SP).  Macie takes advantage of AWS CloudTrail and continuously checks Cloudtrail events for PUT requests in S3 buckets and automatically classify new objects in almost real time.

While Macie is a powerful tool to use for security and data protection in the AWS cloud, it also can aid you with governance, compliance requirements, and/or audit standards.  Many of you may already be aware of the EU’s most stringent privacy regulation to date – The General Protection Data Regulation (GDPR), which becomes enforceable on May 25, 2018. As Amazon Macie recognizes personally identifiable information (PII) and provides customers with dashboards and alerts, it will enable customers to comply with GDPR regulations around encryption and pseudonymization of data. When combined with Lambda queries, Macie becomes a powerful tool to help remediate GDPR concerns.

Tour of the Amazon Macie Service

Let’s look a tour of the service and look at Amazon Macie up close and personal.

First, I will log onto the Macie console and start the process of setting up Macie so that I can start to my data classification and protection by clicking the Get Started button.


As you can see, to enable the Amazon Macie service, I must have the appropriate IAM roles created for the service, and additionally I will need to have AWS CloudTrail enabled in my account.

I will create these roles and turn on the AWS CloudTrail service in my account. To make things easier for you to setup Macie, you can take advantage of sample template for CloudFormation provided in the Macie User Guide that will set up required IAM roles and policies for you, you then would only need to setup a trail as noted in the CloudTrail documentation.

If you have multiple AWS accounts, you should note that the account you use to enable the Macie service will be noted as the master account, you can integrate other accounts with the Macie service but they will have the member account designation. Users from member accounts will need to use an IAM role to federate access to the master account in order access the Macie console.

Now that my IAM roles are created and CloudTrail is enabled, I will click the Enable Macie button to start Macie’s data monitoring and protection.


Once Macie is finished starting the service in your account, you will be brought to the service main screen and any existing alerts in your account will be presented to you. Since I have just started the service, I currently have no existing alerts at this time.


Considering we are doing a tour of the Macie service, I will now integrate some of my S3 buckets with Macie. However, you do not have to specify any S3 buckets for Macie to start monitoring since the service already uses the AWS CloudTrail Management API analyze and process information. With this tour of Macie, I have decided to monitor some object level API events in from certain buckets in CloudTrail.

In order to integrate with S3, I will go to the Integrations tab of the Macie console.  Once on the Integrations tab, I will see two options: Accounts and Services. The Account option is used to integrate member accounts with Macie and to set your data retention policy. Since I want to integrate specific S3 buckets with Macie, I’ll click the Services option go to the Services tab.


When I integrate Macie with the S3 service, a trail and a S3 bucket will be created to store logs about S3 data events. To get started, I will use the Select an account drop down to choose an account.  Once my account is selected, the services available for integration are presented. I’ll select the Amazon S3 service by clicking the Add button.

Now I can select the buckets that I want Macie to analyze, selecting the Review and Save button takes me to a screen which I confirm that I desire object level logging by clicking Save button.

4
Next, on our Macie tour, let’s look at how we can customize data classification with Macie.

As we discussed, Macie will automatically monitor and classify your data. Once Macie identifies your data it will classify your data objects by file and content type. Macie will also use a support vector machine (SVM) classifier to classify the content within S3 objects in addition to the metadata of the file. In deep learning/machine learning fields of study, support vector machines are supervised learning models, which have learning algorithms used for classification and regression analysis of data. Macie trained the SVM classifier by using a data of varying content types optimized to support accurate detection of data content even including the source code you may write.

Macie will assign only one content type per data object or file, however, you have the ability to enable or disable content type and file extensions in order to include or exclude them from the Macie service classifying these objects. Once Macie classifies the data, it will assign risk level of the object between 1 and 10 with 10 being the highest risk and 1 being the lowest data risk level.

To customize the classification of our data with Macie, I’ll go to the Settings Tab. I am now presented with the choices available to enable or disable the Macie classifications settings.


For an example during our tour of Macie, I will choose File extension. When presented with the list of file extensions that Macie tracks and uses for classifications.

As a test, I’ll edit the apk file extension for Android application install file, and disable monitoring of this file by selecting No – disabled from the dropdown and clicking the Save button. Of course, later I will turn this back on since I want to keep my entire collection of data files safe including my Android development binaries.


One last thing I want to note about data classification using Macie is that the service provides visibility in how you data object are being classified and highlights data assets that you have stored regarding how critical or important the information for compliance, for your personal data and for your business.

Now that we have explored the data that Macie classifies and monitors, the last stop on our service tour is the Macie dashboard.

 

The Macie Dashboard provides us with a complete picture of all of the data and activity that has been gathered as Macie monitors and classifies our data. The dashboard displays Metrics and Views grouped by categories to provide different visual perspectives of your data. Within these dashboard screens, you also you can go from a metric perspective directly to the Research tab to build and run queries based on the metric. These queries can be used to set up customized alerts for notification of any possible security issues or problems. We won’t have an opportunity to tour the Research or Alerts tab, but you can find out more information about these features in the Macie user guide.

Turning back to the Dashboard, there are so many great resources in the Macie Dashboard that we will not be able to stop at each view, metric, and feature during our tour, so let me give you an overview of all the features of the dashboard that you can take advantage of using.

Dashboard Metrics monitored data grouped by the following categories:

  • High-risk S3 objects: data objects with risk levels of 8 through 10.
  • Total event occurrences: – total count of all event occurrences since Macie was enabled
  • Total user sessions – 5-minute snapshot of CloudTrail data

Dashboard Views – views to display various points of the monitored data and activity:

  • S3 objects for a selected time range
  • S3 objects
  • S3 objects by personally identifiable information (PII)
  • S3 objects by ACL
  • CloudTrail events and associated users
  • CloudTrail errors and associated users
  • Activity location
  • AWS CLoudTrail events
  • Activity ISPs
  • AWS CloudTrail user identity types

Summary

Well, that concludes our tour of the new and exciting Amazon Macie service. Amazon Macie is a sensational new service that uses the power of machine learning and deep learning to aid you in securing, identifying, and protecting your data stored in Amazon S3. Using natural language processing (NLP) to automate data classification, Amazon Macie enables you to easily get started with high accuracy classification and immediate protection of your data by simply enabling the service.  The interactive dashboards give visibility to the where, what, who, and when of your information allowing you to proactively analyze massive streams of data, data accesses, and API calls in your environment. Learn more about Amazon Macie by visiting the product page or the documentation in the Amazon Macie user guide.

Tara

Tara Walker

Tara Walker

Tara was a Technical Evangelist for Amazon Web Services, dedicating her time to help developers build apps, games, and technical solutions in the AWS cloud. Tara worked on evangelizing AWS cloud computing architectures and development for various technologies like Mobile, Gaming, IoT, AI, Serverless just to name a few. Tara’s background is as a software engineer & developer who has worked on wide-ranging development platforms and systems while leveraging a myriad of development languages across her various technical and engineering roles. Over her 20+ year career, she has been employed by Microsoft, Turner Broadcasting/Time Warner, Georgia Pacific, and various other Fortune 500 companies. She holds a Bachelor’s degree from Georgia State University, and currently working on her Master’s degree in Computer Science (MSCS) at Georgia Institute of Technology. Tara's passion is to continue spreading the “good news” to diverse audiences about a plethora of technologies, development languages, and frameworks with a focus and proficiency in: - Cloud computing and Serverless architectures - IoT (Internet of Things) development - Mobile, Game, and Web development - Artificial Intelligence services and frameworks - NUI (Natural User Interfaces) & Biometric Interface service frameworks - Cross-Platform development frameworks You can find Tara on Twitter at @taraw.