Guidance for Accelerating Analytics on AWS
Overview
How it works
This Guidance helps you quickly deploy a data analytics stack using AWS services. In this diagram, an administrator creates a storage bucket and uploads files that can be processed for data analysis and visualization.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
The AWS services used in this Guidance support logging information in Amazon CloudWatch or AWS CloudTrail that can be tracked and reviewed for additional customization. This enables a fast and easy way to review errors and respond to incidents appropriately.
Security
The AWS Glue database and table created in this Guidance are secured using Lake Formation, and access is granted only to the IAM role used by Quicksight. This allows for secure authentication and authorization for people and machine access.
The resources provisioned in this Guidance are private by default, and can only be modified with IAM identity-based policies.
Reliability
Since AWS services used in this Guidance are serverless, use AWS managed endpoints and DNS, the implementation can depend on the high availability and resiliency to failures that are inherent in AWS services.
AWS CloudFormation automates deployment and provisioning of resources. Upon failure of one resource, the implementation rolls back all other provisioned resources, ensuring you have a reliable application-level architecture.
Additionally, CloudFormation logs resource provisioning and errors that can be accessed using CloudTrail and CloudWatch. QuickSight sends an email to notify account administrators when significant events occur.
Performance Efficiency
The services selected for this Guidance are purpose-built to handle advanced analytics. For example, QuickSight is an AWS managed serverless business intelligence (BI) service that integrates with Athena to query data in Amazon S3.
A QuickSight analysis, where you analyze and visualize your data, is created when this Guidance is deployed, helping you gain insights from your data. Based on Athena tables, also created by this Guidance, you can experiment and create additional tables to query data in Amazon S3 or other supported data sources.
Cost Optimization
This Guidance uses managed services that deploy a pay-as-you-go approach, removing the need to maintain overhead and reduce cost. AWS services used in this Guidance are also provisioned in the same AWS Region to reduce data transfer charges, whereas QuickSight does not accrue any data transfer charges.
All services in this Guidance are serverless and do not require running for an extended period of time. To ensure this Guidance scales to continually match the demand with the minimum resources, it deploys an Amazon S3 bucket that contains only customer provisioned data. The AWS Glue crawler runs once per day to check for new data, and Athena is invoked only when using QuickSight.
Sustainability
This Guidance invokes Athena only when users interact with the corresponding data sets in QuickSight, ensuring limited provisioning of resources. Additionally, Amazon S3 and Athena automatically scale to accommodate data that you provide.
Architecture patterns that maintain consistently high utilization of deployed resources are implemented with this Guidance. For example, the AWS Glue crawler is only invoked once every day to crawl customer data in Amazon S3 buckets and to update the Glue Catalog. Finally, this Guidance uses serverless AWS services that do not require continuous hardware provisioning, making it a more sustainable architecture.
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages