Skip to main content

Guidance for Automated Querying of Amazon S3 Logs with Amazon Athena

Overview

The Guidance demonstrates an automated workflow for users to easily query and identify requests from various Amazon Simple Storage Service (Amazon S3) related logs. By deploying the provided AWS CloudFormation stack, users can set up the necessary infrastructure to automatically copy and process their Amazon S3 logs. The workflow uses Amazon Athena, a serverless query service, to enable users to run SQL queries against the log data and identify potential security or operational issues.

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Deploy with confidence

Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs. 

Go to sample code

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

This Guidance uses CloudFormation for comprehensive resource deployment tracking and visibility. In addition, Lambda records operational events in Amazon CloudWatch logs, while Amazon SNS delivers near real-time workflow status notifications. Moreover, Amazon S3 batch operations execute copy operations at scale, providing detailed completion reports at both task and aggregate levels. This combination enables rapid issue identification and root cause analysis through detailed logging and monitoring capabilities.

Read the Operational Excellence whitepaper

Security best practices are implemented with AWS Identity and Access Management (IAM), employing the principle of least privilege. Each component, particularly the Lambda functions, operates with permissions scoped to only those required for their specific tasks. This granular access control minimizes the potential attack surface area.

Read the Security whitepaper

This architecture supports highly available workloads through multiple AWS services. First, Lambda is serverless and is designed to automatically scale as the number of concurrent request increases; this service also runs each function across multiple Availability Zones (AZs) to help ensure high availability. Second, Amazon SNS stores each message it receives across multiple AZs, and it delivers each message at least once. Third, Amazon S3 provides a reliable and durable storage across multiple AZs for the Amazon S3 logs that needs to be queried. Fourth, Amazon S3 batch operations provides a purpose built feature to perform reliable large-scale object copy with error retries and scaling. Lastly, the AWS Software Development Kit (AWS SDK) is optimized to perform retries and backoff to handle transient errors.

Read the Reliability whitepaper

Lambda enables parallel API request processing, while Athena provides SQL-based querying capabilities that scale automatically with data volume. In addition, Amazon S3 batch operations handles billions of objects efficiently, and Amazon S3 delivers consistent low-latency access to log data. This serverless approach eliminates infrastructure management overhead while maintaining high performance.

Read the Performance Efficiency whitepaper

This Guidance minimizes costs through serverless computing and efficient storage management. Specifically, Amazon S3 lifecycle rules automatically expire unnecessary data, while Lambda charges only for actual compute time used. And, the pay-per-query model of Athena eliminates the need for persistent query infrastructure. Lastly, Amazon S3 batch operations provides cost-effective bulk operations without requiring dedicated compute resources.

Read the Cost Optimization whitepaper

This Guidance promotes sustainability through efficient resource utilization. Event-driven Lambda functions only consume resources when needed, while Amazon S3 lifecycle policies automatically remove unnecessary data. The serverless architecture eliminates idle resource waste, and the solution's automated scaling helps ensure resources match actual demand. Finally, integration with Amazon SNS enables efficient message delivery without the users having to maintain dedicated infrastructures.  

Read the Sustainability whitepaper

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.