AWS Developer Tools Blog

Using AWS CloudTrail in PHP – Part 2

This is part 2 of Using AWS CloudTrail in PHP. Part 1 demonstrated the basics of how to work with the CloudTrail service, including how to create a trail and turn logging on and off. Today, I want to show you how to read your log files and iterate over individual log records using the AWS SDK for PHP.

AWS CloudTrail log files

CloudTrail creates JSON-formatted log files containing your AWS API call history and stores them in the Amazon S3 bucket you choose. There is no API provided by CloudTrail for reading your log files, because the log files are stored in Amazon S3. Therefore, you can use the Amazon S3 client provided by the SDK to download and read your logs.

Your log files are stored in a predictable path within your bucket based on the account ID, region, and timestamp of the API calls. Each log file contains JSON-formatted data about the API call events, including the service, operation, region, time, user agent, and request and response data. You can see a full specification of the log record data on the CloudTrail Event Reference page of the CloudTrail documentation.

Log reading tools in the SDK

Even though it is a straightforward process to get your log files from Amazon S3, the SDK provides an easier way to do it from your PHP code. As of version 2.4.12 of the SDK, you can use the LogFileIterator, LogFileReader, and LogRecordIterator classes in the AwsCloudTrail namespace to read the log files generated by your trail.

  • LogFileIterator class – Allows you to iterate over the log files generated by a trail, and can be limited by a date range. Each item yielded by the iterator contains the bucket name and object key of the log file.
  • LogFileReader class – Allows you to read the log records of a log file identified by its bucket and key.
  • LogRecordIterator class – Allows you to iterate over log records from one or more log files, and uses the other two classes.

These classes add some extra conveniences over performing the Amazon S3 operations yourself, including:

  1. Automatically determining the paths to the log files based on your criteria.
  2. The ability to fetch log files or records from a specific date range.
  3. Automatically uncompressing the log files.
  4. Extracting the log records into useful data structures.

Instantiating the LogRecordIterator

You can instantiate the LogRecordIterator using one of the three provided factory methods. Which one you choose is determined by what data is available to your application.

  • LogRecordIterator::forTrail() – Use this if the name of the bucket containing your logs is not known.
  • LogRecordIterator::forBucket() – Use this if the bucket name is known.
  • LogRecordIterator::forFile() – Use this if retrieving records from a single file. The bucket name and object key are required.

If you already know what bucket contains your log files, then you can use the forBucket() method, which requires an instance of the Amazon S3 client, the bucket name, and an optional array of options.

use AwsCloudTrailLogRecordIterator;

$records = LogRecordIterator::forBucket($s3Client, 'YOUR_BUCKET_NAME', array(
    'start_date' => '-1 day',
    'log_region' => 'us-east-1',
));

Iterate over the LogRecordIterator instance allows you to get each log record one-by-one.

foreach ($records as $record) {
    // Print the operation, service name, and timestamp of the API call
    printf(
        "Called the %s operation on %s at %s.n",
        $record['eventName'],
        $record['eventSource'],
        $record['eventTime']
    );
}

NOTE: Each record is yielded as a Guzzle Collection object, which means it behaves like an array, but returns null for non-existent keys instead triggering an error. It also has methods like getPath() and getAll() that can be useful when working with the log record data.

A complete example

Let’s say that you want to look at all of your log records generated by the Amazon EC2 service during a specific week, and count how many times each Amazon EC2 operation was used. We’ll assume that the bucket name is not known, and that the trail was created via the AWS Management Console.

If you don’t know the name of the bucket, but you do know the name of the trail, then you can use the forTrail() factory method to instantiate the iterator. This method will use the CloudTrail client and the trail name to discover what bucket the trail uses for publishing log files. Trails created via the AWS Management Console are named “Default”, so if you omit trail_name from the options array, “Default” will be used as the trail_name automatically.

$records = LogRecordIterator::forTrail($s3Client, $cloudTrailClient, array(
    'start_date' => '2013-12-08T00:00Z',
    'end_date'   => '2013-12-14T23:59Z',
));

The preceding code will give you an iterator that will yield all the log records for the week of December 8, 2013. To filter by the service, we can decorate the LogRecordIterator with an instance of PHP’s very own CallbackFilterIterator class.

$records = new CallbackFilterIterator($records, function ($record) {
    return (strpos($record['eventSource'], 'ec2') !== false);
});

NOTE: CallbackFilterIterator is available only in PHP 5.4+. However, Guzzle provides a similar class (GuzzleIteratorFilterIterator) for applications running on PHP 5.3.

At this point, it is trivial to count up the operations.

$opCounts = array();
foreach ($records as $record) {
    if (isset($opCounts[$record['eventName']])) {
        $opCounts[$record['eventName']]++;
    } else {
        $opCounts[$record['eventName']] = 1;
    }
}

print_r($opCounts);

There’s a Part 3, too

In the final part of Using AWS CloudTrail in PHP, I’ll show you how to set up CloudTrail to notify you of new log files via Amazon SNS. Then I’ll use the log reading tools from today’s post, combined with the SNS Message Validator class from the SDK, to show you how to read log files as soon as they are published.