This Guidance demonstrates how customer applications can scan artifacts for Personally Identifiable Information (PII), financial information or credentials, and other sensitive information with Amazon Macie.
The application initiates an Amazon Macie scan request by providing the Amazon Simple Storage Service (Amazon S3) object criteria, conditional identifiers, allow lists, and so on. Also, the application subscribes to Macie job completion events by providing a pre-created Amazon EventBridge event bus Amazon Resource Name (ARN).
A Lambda function saves the unique identifier of the Macie sensitive data discovery job and the EventBridge event bus ARN in Amazon DynamoDB.
Macie scans the various Amazon S3 objects to look for sensitive information. It needs access to the customer managed AWS Key Management Service (AWS KMS) key(s) that are used to encrypt the Amazon S3 objects.
Macie stores the scan results and findings in another Amazon S3 bucket and encrypts using a customer-managed AWS KMS key.
An Amazon CloudWatch subscription filter invokes a Lambda function to notify about the Macie job status events.
A Lambda function sends an event on the appropriate EventBridge event bus provided by the application.
The EventBridge Rule notifies the application if the event matches the rule.
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
This Guidance uses Infrastructure as Code with technologies such as the AWS Cloud Development Kit (AWS CDK) and/or AWS CloudFormation templates. All Lambda function operations are logged in CloudWatch. These solutions detect the presence of sensitive information that is tracked by Macie.
For secure authentication and authorization, this Guidance requires identities to acquire temporary credentials.
Customer managed policies must be created for the Lambda function implementation roles following the principle of least privilege.
Macie uses VPC endpoints to access Amazon S3 whereas access to API Gateway is protected using AWS WAF. The Amazon S3 buckets store the artifacts and the results or findings must be encrypted using AWS KMS keys.
This Guidance implements a reliable application architecture by leveraging serverless technology, including Macie, that logs job status events to CloudWatch. To support data backup and recovery, DynamoDB tables must be periodically backed up and all of the Amazon S3 buckets must be replicated into a different region.
Macie is purpose built for sensitive data discovery and classification. To meet workload requirements, highly scalable managed services such as Lambda functions, API Gateway, and DynamoDB are leveraged.
For this Guidance, cost optimization is achieved by leveraging serverless technology (for example: Lambda). The scale and costs are dictated by Macie to ensure that only the minimum resources are required.
To scale and continually match the load to ensure only the minimum resources are required, this Guidance utilizes serverless (Lambda) and event driven (EventBridge) technologies.
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.