AWS Big Data Blog

Audit AWS service events with Amazon EventBridge and Amazon Kinesis Data Firehose

February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more.

Amazon EventBridge is a serverless event bus that makes it easy to build event-driven applications at scale using events generated from your applications, integrated software as a service (SaaS) applications, and AWS services. Many AWS services generate EventBridge events. When an AWS service in your account emits an event, it goes to your account’s default event bus.

The following are a few event examples:

By default, these AWS service-generated events are transient and therefore not retained. This post shows how you can forward AWS service-generated events or custom events to Amazon Simple Storage Service (Amazon S3) for long-term storage, analysis, and auditing purposes using EventBridge rules and Amazon Kinesis Data Firehose.

Solution overview

In this post, we provide a working example of AWS service-generated events ingested to Amazon S3. To make sure we have some service events available in default event bus, we use Parameter Store, a capability of AWS Systems Manager to store new parameters manually. This action generates a new event, which is ingested by the following pipeline.

Architecture Diagram

The pipeline includes the following steps:

  1. AWS service-generated events (for example, a new parameter created in Parameter Store) goes to the default event bus at EventBridge.
  2. The EventBridge rule matches all events and forwards those to Kinesis Data Firehose.
  3. Kinesis Data Firehose delivers events to the S3 bucket partitioned by detail-type and receipt time using its dynamic partitioning capability.
  4. The S3 bucket stores the delivered events, and their respective event schema is registered to the AWS Glue Data Catalog using an AWS Glue crawler.
  5. You query events using Amazon Athena.

Deploy resources using AWS CloudFormation

We use AWS CloudFormation templates to create all the necessary resources for the ingestion pipeline. This removes opportunities for manual error, increases efficiency, and provides consistent configurations over time. The template is also available on GitHub.

Complete the following steps:

  1. Click here to
    Launch Stack
  2. Acknowledge that the template may create AWS Identity and Access Management (IAM) resources.
  3. Choose Create stack.

The template takes about 10 minutes to complete and creates the following resources in your AWS account:

  • An S3 bucket to store event data.
  • A Firehose delivery stream with dynamic partitioning configuration. Dynamic partitioning enables you to continuously partition streaming data in Kinesis Data Firehose by using keys within the data (for example, customer_id or transaction_id) and then deliver the data grouped by these keys into corresponding S3 prefixes.
  • An EventBridge rule that forwards all events from the default event bus to Kinesis Data Firehose.
  • An AWS Glue crawler that references the path to the event data in the S3 bucket. The crawler inspects data landed to Amazon S3 and registers tables as per the schema with the AWS Glue Data Catalog.
  • Athena named queries for you to query the data processed by this example.

Trigger a service event

After you create the CloudFormation stack, you trigger a service event.

  1. On the AWS CloudFormation console, navigate to the Outputs tab for the stack.
  2. Choose the link for the key CreateParameter.

Create Parameter

You’re redirected to the Systems Manager console to create a new parameter.

  1. For Name, enter a name (for example, my-test-parameter).
  2. For Value, enter the test value of your choice (for example, test-value).

My Test parameter

  1. Leave everything else as default and choose Create parameter.

This step saves the new Systems Manager parameter and pushes the parameter-created event to the default EventBridge event bus, as shown in the following code:

{
  "version": "0",
  "id": "6a7e4feb-b491-4cf7-a9f1-bf3703497718",
  "detail-type": "Parameter Store Change",
  "source": "aws.ssm",
  "account": "123456789012",
  "time": "2017-05-22T16:43:48Z",
  "region": "us-east-1",
  "resources": [
    "arn:aws:ssm:us-east-1:123456789012:parameter/foo"
  ],
  "detail": {
    "operation": "Create",
    "name": "my-test-parameter",
    "type": "String",
    "description": ""
  }
}

Discover the event schema

After the event is triggered by saving the parameter, wait at least 2 minutes for the event to be ingested via Kinesis Data Firehose to the S3 bucket. Now complete the following steps to run an AWS Glue crawler to discover and register the event schema in the Data Catalog:

  1. On the AWS Glue console, choose Crawlers in the navigation pane.
  2. Select the crawler with the name starting with S3EventDataCrawler.
  3. Choose Run crawler.

Run Crawler

This step runs the crawler, which takes about 2 minutes to complete. The crawler discovers the schema from all events and registers it as tables in the Data Catalog.

Query the event data

When the crawler is complete, you can start querying event data. To query the event, complete the following steps:

  1. On the AWS CloudFormation console, navigate to the Outputs tab for your stack.
  2. Choose the link for the key AthenaQueries.

Athena Queries

You’re redirected to the Saved queries tab on the Athena console. If you’re running Athena queries for the first time, set up your S3 output bucket. For instructions, see Working with Query Results, Recent Queries, and Output Files.

  1. Search for Blog to find the queries created by this post.
  2. Choose the query Blog – Query Parameter Store Events.

Find Athena Saved Queries

The query opens on the Athena console.

  1. Choose Run query.

You can update the query to search the event you created earlier.

  1. Apply a WHERE clause with the parameter name you selected earlier:
SELECT * FROM "AwsDataCatalog"."eventsdb-randomId"."parameter_store_change"
WHERE detail.name = 'Your event name'

You can also choose the link next to the key CuratedBucket from the CloudFormation stack outputs to see paths and the objects loaded to the S3 bucket from other event sources. Similarly, you can query them via Athena.

Clean up

Complete the following steps to delete your resources and stop incurring costs:

  1. On the AWS CloudFormation console, select the stack you created and choose Delete.
  2. On the Amazon S3 console, find the bucket with the name starting with eventbridge-firehose-blog-curatedbucket.
  3. Select the bucket and choose Empty.
  4. Enter permanently delete to confirm the choice.
  5. Select the bucket again and choose Delete.
  6. Confirm the action by entering the bucket name when prompted.
  7. On the Systems Manager console, go to the parameter store and delete the parameter you created earlier.

Summary

This post demonstrates how to use an EventBridge rule to redirect AWS service-generated events or custom events to Amazon S3 using Kinesis Data Firehose to use for long-term storage, analysis, querying, and audit purposes.

For more information, see the Amazon EventBridge User Guide. To learn more about AWS service events supported by EventBridge, see Events from AWS services.


About the Author

Anand ShahAnand Shah is a Big Data Prototyping Solution Architect at AWS. He works with AWS customers and their engineering teams to build prototypes using AWS analytics services and purpose-built databases. Anand helps customers solve the most challenging problems using the art of the possible technology. He enjoys beaches in his leisure time.