AWS Startups Blog
Searching CloudTrail Logs Easily with Amazon CloudSearch
I. Overall Architecture
You can leverage several AWS offerings to achieve a simple, scalable, and robust architecture to index CloudTrail logs in CloudSearch. You start by configuring CloudTrail to deliver an Amazon SNS notification as soon as a new log file becomes available. Each notification is posted into an Amazon SQS queue and handled by a simple AWS Elastic Beanstalk application using the worker role. This application retrieves each file from the S3 bucket, extracts logs, and adds each log to a CloudSearch domain.
II. Set Up AWS CloudTrail
If you haven’t already, you can easily activate CloudTrail from the AWS Management Console. Go to the CloudTrail console, click Get Started (or Edit for an existing configuration), select Yes for Create a new S3 bucket, and enter the desired bucket name. Click Save. CloudTrail creates the bucket, automatically setting up the required policies. Note that you need to activate CloudTrail for each AWS region. It is a common practice to use the same S3 bucket for all AWS regions.
In the CloudTrail console, click Advanced and create an SNS notification. For SNS topic (new), type something like “CloudTrail-notification.” Make sure the SNS notification for every log file delivery? is set to Yes.
Note: if you are using CloudTrail in multiple AWS regions, you should create one SNS topic per region.
III. Create the AWS CloudSearch Domain
Amazon CloudSearch makes it simple and cost effective to set up, manage, and scale a custom search solution for your website or application. It’s easy to create and configure a CloudSearch domain from the AWS management console. You can also use the AWS CLI to script the creation and configuration of your domain.
Creating a script simplifies performing future deployments and changes. You can download and install the AWS CLI by following the steps in the AWS Command Line Interface User Guide.
The CloudSearch domain creation will take several minutes to complete. Download the domain creation script here.
The script takes care of the domain creation and is configured with a default domain name “cloudtrail-1” and created in the “us-east-1” region. You can easily customize the script to change domain name or AWS region. You can use a single CloudSearch domain to index CloudTrail logs from multiple regions.
IV. Create an SQS Queue
Amazon Simple Queue Service (SQS) is a fast, reliable, scalable, fully managed message queuing service. By integrating Amazon SNS with Amazon SQS, all notifications delivered persist in an Amazon SQS queue where they are processed by an Elastic Beanstalk application that indexes these logs in CloudSearch.
Creating a new SQS queue is easy with the AWS Management Console. In the SQS console, click Create New Queue and specify the following parameters:
• Queue Name: CloudTrail-sqs
• Default Visibility Timeout: 1 minute
• Message Retention Period: 14 days (maximum)
• Receive Message Wait Time: 20 seconds
Click Create Queue.
Next, click Queue Actions and click Subscribe Queue to an SNS Topic. For Choose a Topic, select the SNS topic that you created in the CloudTrail console. Click Subscribe. The AWS Console will automatically set up the required security policies.
Note: If you are using CloudTrail in multiple AWS regions, you should subscribe this SQS queue to each SNS topic in each region.
After several minutes you should see messages starting to arrive into your SQS queue.
V. Launch the Elastic Beanstalk Application
Now it’s time to install a single-file application written in Python using the Flask framework. You will leverage the Elastic Beanstalk worker role mode. A worker is simply an HTTP request handler that Beanstalk invokes with messages buffered using SQS. Messages put in the queue are forwarded via HTTP POST to a configurable URL on the ElasticBeanstalk hosted application.
As the application needs to issue AWS API calls to Amazon S3 and to Amazon CloudSearch, we will also use an use an IAM role for EC2 to allow the application to make secure API requests from your instances without requiring you to manage the security credentials that the application uses. Let’s first create the required role in the console.
Navigate to the IAM console, click Roles in the navigation pane and then click Create New Role. Enter the role name, such as “cloudsearch-index,” and click Next Step. Then click Select for Amazon EC2 under AWS Services Roles. On the Set Permissions page, scroll down and click Custom Policy and Select. Copy and paste the policy below and give it a name. Click Next Step and then Create Role. This results in a policy like the following:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "cloudtrailworkerrole", "Effect": "Allow", "Action": [ "cloudsearch:DescribeDomains", "cloudsearch:ListDomainNames", "cloudsearch:document", "s3:GetObject", "s3:ListBucket", "sqs:ChangeMessageVisibility", "sqs:DeleteMessage", "sqs:ReceiveMessage", "cloudwatch:PutMetricData" ], "Resource": [ "*" ] } ] }
Next you need to use the console to launch the Elastic Beanstalk application. You can download the .zip file here. The file .ebextensions/cloudtrail.config contains the CloudSearch domain name and region. You can change this file or later change PARAM1 and PARAM2 directly within Elastic Beanstalk.
option_settings: "aws:elasticbeanstalk:application:environment": PARAM1: cloudtrail-1 PARAM2: us-east-1
It is best to deploy your Beanstalk app in a VPC to get all the benefits of t2.micro instances: lowest-cost general purpose instance type with burstable CPU.
When you’re ready, go to the Elastic Beanstalk console and click Create a New Application. Deploying the Elastic Beanstalk is straightforward. Here are the required nondefault parameters:
• Environment tier: Worker
• Predefined configuration: Python
• Environment type: Load balancing, autoscaling (actually no load balancer is created with worker role; this option only creates the autoscaling group)
Click Next. Click Browse and upload the application .zip file you downloaded previously.
Click Next. Give a name to the Environment. The default is “cloudtrail1-env.”
Click Next. Select Create this environment inside a VPC.
Click Next. In the Configuration Details page, use these settings:
• Instance type: t2.micro
• Application health check URL: /
• Instance profile: cloudsearch-index
Click Next twice. For Worker Details, use these settings:
• Worker queue: CloudTrail-sqs
• HTTP path: /sns/
• MIME Type: keep “application/json”
Click Next. For VPC security group, you can use the default. (Actually you will not need any ingress network traffic.)
After a couple of minutes, the application status turns green, and your CloudSearch domain starts to be populated.
VI. Using CloudSearch
After several minutes, CloudTrail logs become directly searchable in CloudSearch. You can use the AWS console to issue simple requests. Using the CloudSearch console, select your CloudSearch domain and click on Run a Test Search in the navigation pane. Click the drop-down menu next to Search and select the Structured query parser.
Here are some sample search requests to try:
• matchall (this would view all documents, useful to explore facets)
• event_source:’sqs.amazonaws.com’ (this would show all Amazon SQS events)
• event_time:[‘2014–09–01T00:00:00Z’,’2014–09–02T00:00:00Z’] (this would show all events for Sept 1st 2014)
Note that CloudSearch displays facets on the right column, which helps you easily explore the data.
VII. Conclusion
This example shows how you can create a simple architecture designed to handle AWS CloudTrail logs as soon as they are produced and ingest them to real-time indexing tools like CloudSearch. You can apply similar solutions to handle any type of data and ingest them into databases like Amazon Redshift or Amazon DynamoDB, or into real-time processing tools such as Amazon Kinesis.