How do I query Amazon Simple Storage Service (Amazon S3) server access logs in Amazon Athena?

Amazon S3 stores server access logs as objects in an S3 bucket. Logs can't be easily analyzed in S3. Athena supports analysis of S3 objects and can be used to query S3 access logs.

1.   If access logging is not enabled for your S3 bucket, follow the steps in How Do I Enable Server Access Logging for an S3 Bucket? Note the Target bucket name and Target prefix—you'll need both to specify the S3 location in an Athena query. Example:

s3://target-bucket-name/prefix/

2.   Open the Athena console.

3.   In the Query Editor, run a command similar to the following to create a database. Note: It's a best practice to create the database in the same region as your S3 bucket.

create database s3_access_logs_db

4.   In the Query Editor, run a command similar to the following to create a table schema in the database that you created in step #3. The STRING and BIGINT data type values are the access log properties. You can query these properties in Athena. For LOCATION, enter the S3 bucket and prefix path from step #1.

CREATE EXTERNAL TABLE IF NOT EXISTS s3_access_logs_db.mybucket_logs(
  BucketOwner STRING,
  Bucket STRING,
  RequestDateTime STRING,
  RemoteIP STRING,
  Requester STRING,
  RequestID STRING,
  Operation STRING,
  Key STRING,
  RequestURI_operation STRING,
  RequestURI_key STRING,
  RequestURI_httpProtoversion STRING,
  HTTPstatus STRING,
  ErrorCode STRING,
  BytesSent BIGINT,
  ObjectSize BIGINT,
  TotalTime STRING,
  TurnAroundTime STRING,
  Referrer STRING,
  UserAgent STRING,
  VersionId STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
  'input.regex' = '([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\") ([^ ]*)$'
)LOCATION 's3://target-bucket-name/prefix/'

5.   Under Tables in the left pane, choose Preview table from the menu button that is next to the table name. In the Results window, you should see data from the server access logs, such as bucketowner, bucket, requestdatetime, and so on. This means that you successfully created the Athena table and can now query the S3 server access logs.

Example queries

To find the log for a deleted object:

SELECT * FROM s3_access_logs_db.mybucket_logs WHERE 
key = 'images/picture.jpg' AND operation like '%DELETE%';

To show who deleted an object and when (time stamp, IP address, and AWS Identity and Access Management (IAM) user):

SELECT RequestDateTime, RemoteIP, Requester, Key FROM s3_access_logs_db.mybucket_logs WHERE 
key = 'images/picture.jpg' AND operation like '%DELETE%';

To show all operations executed by an IAM user:

SELECT * FROM s3_access_logs_db.mybucket_logs WHERE 
requester='arn:aws:iam::123456789123:user/user_name';

To show all operations that were performed on an object in a specific time period:

SELECT *
FROM s3_access_logs_db.mybucket_logs
WHERE Key='prefix/images/picture.jpg' AND
parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') 
BETWEEN parse_datetime('2017-02-18:07:00:00','yyyy-MM-dd:HH:mm:ss')
AND
parse_datetime('2017-02-18:08:00:00','yyyy-MM-dd:HH:mm:ss');

To show how much data was transferred by a specific IP address in a specific time period: 

SELECT SUM(bytessent) as uploadTotal, 
SUM(objectsize) as downloadTotal, 
SUM(bytessent + objectsize) AS Total FROM s3_access_logs_db.mybucket_logs WHERE RemoteIP='1.2.3.4' and parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') BETWEEN parse_datetime('2017-06-01','yyyy-MM-dd') and parse_datetime('2017-07-01','yyyy-MM-dd');

It's a best practice to create an S3 lifecycle policy for your server access logs bucket. Configure the lifecycle policy to periodically remove log files. Doing so reduces the amount of data that Athena analyzes for each query.


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2018-09-07