Front-End Web & Mobile

Real-time data distribution with Amazon MSK and AWS AppSync

by Justin Leto, Deepthi Paruchuri, and Kawshik Sarkar | on | Permalink |  Share

This post was co-written by Hammad Rasheed, Qasim Naeem, and Umair Ajmal from NETSOL Technologies.

The competition to attract and retain an audience in a cluttered and fragmented digital space is accelerating. Customers expect digital products to provide an engaging experience to users. Often times this means enriching applications with streaming, real-time data, however, the technical challenges are difficult to overcome for many customers. Technologies like WebSockets make it possible to deliver live scores, chat applications, dashboards, and collaborative tools powered by real-time data. As an application’s user base grows, manually managing a real-time backend that scales to meet this user growth becomes cumbersome.

In this post, we’ll demonstrate how to leverage AWS services for ingesting and serving streaming data to client applications with minimal operational complexity. We will discuss how to configure and utilize Amazon MSK and AWS AppSync to distribute real-time stock ticker data to end users at scale. In our example, users will be able to subscribe to multiple stocks and receive data for the subscribed stocks in real-time with minimal latency.

AWS AppSync is a fully managed GraphQL service. Taking advantage of GraphQL subscriptions to perform real-time operations, AWS AppSync can push data to clients that choose to listen to specific events from the backend. This means that you can easily and effortlessly make any supported data source in AWS AppSync real-time with connection management handled automatically between the client and the service. Real-time data, WebSocket connections, scalability, fan-out and broadcasting are all handled by intelligent client libraries and AWS AppSync, allowing you to focus on your application business use cases and requirements instead of dealing with the complex infrastructure to manage WebSockets connections at scale.

Amazon Managed Streaming for Apache Kafka (Amazon MSK) provides open-source, highly secure Apache Kafka clusters distributed across multiple Availability Zones (AZs), giving you resilient, highly available streaming storage. Amazon MSK makes it easy to ingest and process streaming data in real time with fully managed Apache Kafka. Amazon MSK is highly configurable, observable, and scalable, allowing for the flexibility and control needed for various use cases.

Overview of solution

The solution diagram below depicts using AWS Lambda functions to fetch data from an external data source, publish it to a Amazon MSK topic, and then distribute the data in real-time using AWS AppSync.

real-time data distribution solution architecture

The following are descriptions of the numbered architectural features from the diagram:

  1. Amazon MSK Producer lambda fetches data from the data source after specific intervals and pushes the data to a specific Amazon MSK topic,
  2. Amazon MSK partitions maintain FIFO order ensuring the order of the data inside each topic. For each topic, an integrated consumer Lambda function is triggered when data arrives,
  3. Amazon MSK consumer Lambda function performs the required transformations and executes the AWS AppSync mutation with the transformed data payload, and
  4. The AWS AppSync subscription mapped to the executed mutation broadcasts data to all the connected clients.

Prerequisites

To deploy the solution described in this blog, you will need the following:

  • Create an AWS account if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
  • Git installed.
  • Node and NPM installed.
  • Amplify CLI installed (only required to generate code for AWS AppSync endpoints).

Configuration Steps

Choose Launch Stack to launch the solution in the us-east-1 Region:  CloudFormation launch stack button

Now that we have our solution configured, let’s explore the major integration points of our architecture.

Publishing stock ticker data into Amazon MSK

To publish stock ticker data into Amazon MSK we will take advantage of the producer Lambda function. The Amazon MSK producer Lambda function fetches stock ticker data from the third party data source and transforms it into the appropriate format to be sent to Amazon MSK. A connection is made with Amazon MSK using the Simple Authentication and Security Layer Salted Challenge Response Authentication Mechanism (SASL-SCRAM) authentication using the Apache Kafka producer library APIs. After the successful connection, the data is sent to the appropriate Amazon MSK topic for a specified interval. The source code for the producer Lambda function can be found here.

Publishing the data from Amazon MSK to AWS AppSync

In order to push the data to AWS AppSync, we will use the Amazon MSK consumer Lambda function. Event triggers ensure the Amazon MSK consumer Lambda function is triggered as soon as data arrives in the respective topic. Individual consumer Lambda functions must be set up for each Amazon MSK topic since the event triggers are specifically linked to each topic and cannot be shared across multiple topics simultaneously. Amazon MSK consumer Lambda function extracts the data from the relevant Amazon MSK topic and forwards it to the AWS AppSync endpoint by executing a mutation for the respective topic to publish the data.

The below snippet highlights the header, which includes the AWS AppSync API key for authentication. A helper function is provided, which retrieves the API key from AWS Secrets Manager and stores it in the appsync_api_key variable.

appsync_header = {
    "Content-Type": "application/json",
    "x-api-key": appsync_api_key
}

Next we’ll look at the payload. The payload contains the Amazon MSK topic data to be sent to AWS AppSync mutation.

kafka_payload = {
    'topic':record_data[0]['topic'],
    'value':base64.b64decode(record_data[0]['value']).decode('utf-8')
}
variables = {"channelName": record_data[0]['topic'], "testData": json.dumps(kafka_payload)}
payload = {"query": query, 'variables': variables}

The below snippet indicates how the mutation is called using the header and payload:

response = requests.post(
    appsync_endpoint,
    json=payload,
    headers=appsync_header
    ).json()

The full code for the Amazon MSK consumer Lambda function can be found here.

Broadcasting the data via AWS AppSync to all users

In order to distribute the data back to the users we use the AWS AppSync API subscription in the React client. AWS AppSync is configured to broadcast the data to all the users subscribed to the particular mutation.

type Mutation {
    publish(name: String!, data: AWSJSON!): Channel
}
type Subscription {
    subscribe(filter: String): Channel @aws_subscribe(mutations: ["publish"])
}

The full code for the schema.graphql file can be found here.

As soon as the data is published to the mutation, AWS AppSync broadcasts the data to the connected subscribers via WebSockets. In the React application, users can subscribe to the relevant topics and start receiving published data.

const channels = [channelData.TESlA.name, channelData.AMAZON.name]
this.subscribe(channels);

Testing the solution

Next, we’ll set up a React web client to test our real-time solution. Follow the setup guidelines of the React application provided in the readme file.

Now that we have our test application set up, we can generate real-time data and see it flow through to our application. To generate data, we need to execute the Amazon MSK producer Lambda function to ingest and send data to the Amazon MSK topic, which triggers Amazon MSK consumer Lambda function.

The Amazon MSK consumer Lambda function then passes the data to AWS AppSync API for distribution. The React application subscribes to the mutation via the AWS AppSync API upon page load and receive real-time updates as soon as the AWS AppSync begins broadcasting. Users should now be able to get real-time stock updates automatically from their app.

Cleaning up

If you provisioned resources using the CloudFormation stack, simply delete the stack. If you used the console to provision resources, delete each of the following resources:

  1. Amazon MSK Cluster
  2. AWS Nat Gateway
  3. Bastion Host
  4. Amazon Lambda Functions
  5. AWS AppSync API
  6. Amazon Secret Manager Secrets
  7. Amazon VPC resources

Conclusion

In this post, we demonstrated how to distribute real-time data to many users at scale using Amazon Managed Streaming for Apache Kafka (MSK) and AWS AppSync using WebSockets. To start building serverless GraphQL APIs with AWS AppSync, try it on the AWS Free Tier.

If you are looking to build a real-time data solution and need additional help, contact the team at NETSOL Technologies to discuss your specific use case.

About the Authors

Justin Leto

Justin Leto is a Sr. Solutions Architect at Amazon Web Services with specialization in databases, big data analytics, and machine learning. His passion is helping customers achieve better cloud adoption. In his spare time, he enjoys offshore sailing and playing jazz piano. He lives in New York City with his wife and daughter.

Deepthi Paruchuri

Deepthi Paruchuri is an AWS Solutions Architect based in NYC. She works closely with customers to build cloud adoption strategy and solve their business needs by designing secure, scalable, and cost-effective solutions in the AWS cloud.

Kawshik Sarkar

Kawshik Sarkar is an AWS Solutions Architect based out of Washington DC. Kawshik has over 15 years of experience in designing and implementing robust tech solutions for the commercial segment. His expertise lies in cloud computing and scalable systems, skillfully bridging the gap between complex business problems and technical solutions.