How do I analyze my audit logs using Redshift Spectrum?

3 minute read

I want to use Amazon Redshift Spectrum to analyze my audit logs.

Short description

Before you use Redshift Spectrum, complete the following tasks:

Note: It might take some time for your audit logs to appear in your Amazon Simple Storage Service (Amazon S3) bucket.

2. Create an AWS Identity and Access Management (IAM) role.

3. Associate the IAM role to your Amazon Redshift cluster.

To query your audit logs in Redshift Spectrum, create external tables, and then configure them to point to a common folder (used by your files). Then, use the hidden $path column and regex function to create views that generate the rows for your analysis.

Resolution

To query your audit logs in Redshift Spectrum, follow these steps:

1. Create an external schema:

create external schema s_audit_logs 
from data catalog 
database 'audit_logs' 
iam_role 'arn:aws:iam::your_account_number:role/role_name' create external database if not exists

Replace your_account_number to match your real account number. For role_name, specify the IAM role attached to your Amazon Redshift cluster.

2. Create the external tables.

Note: In the following examples, replace bucket_name, your_account_id, and region with your bucket name, account ID, and AWS Region.

Create a user activity logs table:

create external table s_audit_logs.user_activity_log(
	logrecord varchar(max)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
	'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
	'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
	's3://bucket_name/logs/AWSLogs/your_account_id/redshift/region'

Create a connection log table:

CREATE EXTERNAL TABLE s_audit_logs.connections_log(
	event varchar(60), recordtime varchar(60),
	remotehost varchar(60), remoteport varchar(60),
	pid int, dbname varchar(60),
	username varchar(60), authmethod varchar(60),
	duration int, sslversion varchar(60),
	sslcipher varchar(150), mtu int,
	sslcompression varchar(70), sslexpansion varchar(70),
	iamauthguid varchar(50), application_name varchar(300))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
	'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
	'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://bucket_name/logs/AWSLogs/your_account_id/redshift/region';

Create a user log table:

create external table s_audit_log.user_log(
	userid varchar(255),
	username varchar(500),
	oldusername varchar(500),
	usecreatedb varchar(50),
	usesuper varchar(50),
	usecatupd varchar(50),
	valuntil varchar(50),
	pid varchar(50),
	xid varchar(50),
	recordtime varchar(50))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
	'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
	'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://bucket_name/logs/AWSLogs/your_account_id/redshift/region’;

3. Create a local schema to view the audit logs:

create schema audit_logs_views

4. To access the external tables, create views in a database with the WITH NO SCHEMA BINDING option:

CREATE VIEW audit_logs_views.v_connections_log AS
select *
FROM s_audit_logs.connections_log
WHERE "$path" like '%connectionlog%'
with no schema binding;

The returned files are restricted by the hidden $path column to match the connectionlog entries.

In the following example, the hidden $path column and regex function restrict the files that are returned for v_connections_log:

CREATE or REPLACE VIEW audit_logs_views.v_useractivitylog AS
SELECT    logrecord,
          substring(logrecord,2,24) as recordtime,
          replace(regexp_substr(logrecord,'db=[^" "]*'),'db=','') as db,
          replace(regexp_substr(logrecord,'user=[^" "]*'),'user=','') AS user,
          replace(regexp_substr(logrecord, 'pid=[^" "]*'),'pid=','') AS pid,
          replace(regexp_substr(logrecord, 'userid=[^" "]*'),'userid=','') AS userid,
          replace(regexp_substr(logrecord, 'xid=[^" "]*'),'xid=','') AS xid,
          replace(regexp_substr(logrecord, '][^*]*'),']','') AS query
   FROM s_audit_logs.user_activity_log
   WHERE "$path" like '%useractivitylog%'
   with no schema binding;

The returned files match the useractivitylog entries.

Note: There's a limitation that's related to the multi-row queries in user activity logs. It's a best practice to query the column log records directly.

Related information

Analyze database audit logs for security and compliance using Amazon Redshift Spectrum

STL_CONNECTION_LOG

Topics

Analytics

Relevant content

Unload data from Redshift and query them using Redshift Spectrum
Accepted Answer
MicZatt
asked 17 days ago
Distinguish Redshift Spectrum from Redshift Costs
Accepted Answer
AWS-User-5967322
asked 6 years ago
Can you use a Redshift Spectrum Table in a CTE?
mmasseth
asked 2 years ago
Parsing Redshift Audit logs
tjtoll
asked 7 months ago
use Redshift Spectrum to query both Redshift table and s3 file
Accepted Answer
AWS-User-5745147
asked 4 years ago
How do I use logs to track activity in my Amazon Redshift database cluster?
AWS OFFICIALUpdated 3 months ago
How do I turn on audit logging in Amazon Redshift and Amazon Redshift Serverless?
AWS OFFICIALUpdated 2 years ago
How do I calculate the query charges in Amazon Redshift Spectrum?
AWS OFFICIALUpdated 2 years ago
How do I resolve data incompatibility errors in Redshift Spectrum?
AWS OFFICIALUpdated 2 years ago
How to customize audit logs in OpenSearch to see which queries were run by users?
EXPERT
Cathy W
published 3 months ago