Serverless applications don’t require you to provision, scale, and manage any servers. You can build them for nearly any type of application or backend service, and everything required to run and scale your application with high availability is handled for you.

Serverless architectures can be used for many types of applications. For example, you can process transaction orders, analyze click streams, clean data, generate metrics, filter logs, analyze social media, or perform IoT device data telemetry and metering.

In this project, you’ll learn how to build a serverless app to process real-time data streams. You’ll build infrastructure for a fictional ride-sharing company. In this case, you will enable operations personnel at a fictional Wild Rydes headquarters to monitor the health and status of their unicorn fleet. Each unicorn is equipped with a sensor that reports its location and vital signs.

You’ll use AWS to build applications to process and visualize this data in real-time. You’ll use AWS Lambda to process real-time streams, Amazon DynamoDB to persist records in a NoSQL database, Amazon Kinesis Data Analytics to aggregate data, Amazon Kinesis Data Firehose to archive the raw data to Amazon S3, and Amazon Athena to run ad-hoc queries against the raw data.

This workshop is broken up into four modules. You must complete each module before proceeding to the next.

1. Build a data stream
    Create a stream in Kinesis and write to and read from the stream to track
    Wild Rydes unicorns on the live map. In this module you'll also create an
    Amazon Cognito identity pool to grant live map access to your stream.

2. Aggregate data
    Build a Kinesis Data Analytics application to read from the stream and
    aggregate metrics like unicorn health and distance traveled each minute.

3. Process streaming data
    Persist aggregate data from the application to a backend database stored
    in DynamoDB and run queries against those data.

4. Store & query data
    Use Kinesis Data Firehose to flush the raw sensor data to an S3 bucket
    for archival purposes. Using Athena, you'll run SQL queries against the
    raw data for ad-hoc analyses.

AWS Experience: Beginner to Intermediate

Time to complete: 110 minutes

Cost to complete: Each service used in this architecture is eligible for the AWS Free Tier. If you are outside the usage limits of the Free Tier, completing this project will cost you less than $0.5 (assuming all services are running for 2 hours)*

To complete this tutorial you will use:

• Active AWS Account**
• Browser (Chrome recommended)
• AWS Lambda
• Amazon Kinesis
• Amazon S3
• Amazon DynamoDB
• Amazon Cognito
• Amazon Athena
• AWS IAM

*This estimate assumes you follow the recommended configurations throughout the tutorial and terminate all resources within 2 hours.

**Accounts that have been created within the last 24 hours might not yet have access to the resources required for this project.

serverless-real-time-data-processing-arch

In order to complete this workshop, you’ll need an AWS account and access to create AWS Identity and Access Management (IAM), Amazon Cognito, Amazon Kinesis, Amazon S3, Amazon Athena, Amazon DynamoDB, and AWS Cloud9 resources within that account. The step-by-step guide below explains you how to set up all prerequisites.

  • Step 1. Create an AWS Account

    The code and instructions in this workshop assume only one participant is using a given AWS account at a time. If you attempt sharing an account with another participant, you will encounter naming conflicts for certain resources. You can work around this by either using a suffix in your resource names or using distinct Regions, but the instructions do not provide details on the changes required to make this work.

    Use a personal account or create a new AWS account for this workshop rather than using an organization’s account to ensure you have full access to the necessary services and to ensure you do not leave behind any resources from the workshop.

    Find here information on how to set up your AWS Account >>

  • Step 2. Select your Region

    Use US East (N. Virginia), US West (Oregon), or EU (Ireland) for this workshop. Each supports the complete set of services covered in the material. Consult the Region Table to determine which services are available in a Region.

  • Step 3. Set up your AWS Cloud9 IDE

    AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser. It includes a code editor, debugger, and terminal. Cloud9 comes pre-packaged with essential tools for popular programming languages and the AWS Command Line Interface (CLI) pre-installed so you don’t need to install files or configure your laptop for this workshop. Your Cloud9 environment will have access to the same AWS resources as the user with which you logged into the AWS Management Console.

    Take a moment now and setup your Cloud9 development environment.


    a. Go to the AWS Management Console, select Services then select Cloud9 under Developer Tools.

    b. Select Create environment.

    c. Enter Development into Name and optionally provide a Description.

    d. Select Next Step.

    e. You may leave Environment settings at their defaults of launching a new t2.micro EC2 instance which will be paused after 30 minutes of inactivity.

    f. Select Next step.

    g. Review the environment setting and select Create environment. It will take several minutes for your environment to be provisioned and prepared.

    h. Once ready, your IDE will open to a welcome screen.

    i. You can run AWS CLI commands in here just like you would on your local computer. Verify that your user is logged in by running aws  sts get-caller-identity.

    j. You'll see the output indicating your account and user information.

    k. Keep your AWS Cloud9 IDE opened in a tab throughout this workshop as you'll use it for activities like building and running a sample app in a Docker container and using AWS CLI.

    Admin:~/environment $ aws sts get-caller-identity
    
    {
        "Account": "123456789012",
        "UserId": "AKIAI44QH8DHBEXAMPLE",
        "Arn": "arn:aws:iam::123456789012:user/Alice"
    }
  • Step 4. Set up the Command Line Clients

    The modules utilize two command-line clients to simulate and display sensor data from the unicorns in the fleet. These are small programs written in the Go Programming Language. The below instructions in the Installation section walks through downloading pre-built binaries, but you can also download the source and build it manually:

    •   produce.go
    •   consumer.go

    The producer generates sensor data from a unicorn taking a passenger on a Wild Ryde. Each second, it emits the location of the unicorn as a latitude and longitude point, the distance traveled in meters in the previous second, and the unicorn’s current level of magic and health points.

    The consumer reads and displays formatted JSON messages from an Amazon Kinesis stream which allow us to monitor in real-time what’s being sent to the stream. Using the consumer, you can monitor the data the producer and your applications are sending.

    1. Switch to the tab where you have your Cloud9 environment opened
    2. Download and unpack the command line clients by running the following command in the Cloud9 terminal:
    curl -s https://dataprocessing.wildrydes.com/client/client.tar | tar -xv
    
    This will unpack the consumer and producer files to your Cloud9 environment.
  • Tips and Recap

    💡 Keep an open scratch pad in Cloud9 or a text editor on your local computer for notes. When the step-by-step directions tell you to note something such as an ID or Amazon Resource Name (ARN), copy and paste that into the scratch pad.

    🔑 Use a unique personal or development AWS account

    🔑 Use one of the US East (N. Virginia), US West (Oregon), or EU (Ireland) Regions

    🔑 Keep your AWS Cloud9 IDE opened in a tab

You have set everything up to get started with serverless real-time data processing. In the next module you will set up a data stream to collect and process real-time data.