Real-time live sports updates with AWS AppSync

This article was written by Stefano Sandrini, Sr. Mobile Specialist Solutions Architect, AWS

AWS AppSync is a fully managed service that allows to deploy serverless GraphQL backends in the AWS Cloud. It provides features that developers can use to create modern data driven applications allowing to easily query multiple databases, microservices, and APIs from a single GraphQL endpoint. Customers can leverage AppSync real-time capabilities, offline data synchronization, built-in server-side caching, fine-grained access control, security, support for business logic in the API layer using GraphQL resolvers, and more. In this article, we focus on AWS AppSync built-in capabilities to implement a real-time use case for live sports events updates and fan engagement.

This use case is particularly important in the Media and Entertainment industry, where companies offer applications that enable their customers to view sports scores as they happen, track live game/match information and statistics, receive fantasy sports updates, and interact with fellow subscribers. Delivering this sort of data in real-time is critical in the media business, hence in order to help enable entertainment companies deliver sports information in real-time at high scale AWS released the solution Real-Time Live Sports Updates Using AWS AppSync. This solution creates an easily deployable reference architecture implemented with best practices in mind, aiming to address challenges commonly found in the media and entertainment industry specifically related to live sports real-time updates.

This reference architecture address common use cases for delivering sports real-time data to customers including:

Flexibility to use multiple ingestion data feeds — A media and entertainment company may have multiple feed providers, depending on the sports or competition.
Live game updates — A viewer is watching a sports event in a mobile or web app, an event occurs (for example, goal scored, extra time left in the game, player injured) and the media and entertainment company must push the event details as it happens to the application in order to engage with the fan base.
Fantasy score updates — In a fantasy sports league, the scores for a fantasy team change with each play in the game. Their points, standings, status in the head-to-head competition against another player in the fantasy league continue to change. These changes and updates are automatically delivered to live viewers of the fantasy game.
Additional stats and info on-demand – Companies can offer additional real-time insights to users watching games via OTT (over-the-top) services.
Push notifications—When mobile app users do not have the app running in the foreground, media and entertainment companies can deliver updates via push notifications (or another channel).

Using AppSync GraphQL subscriptions, a media company can scale automatically to handle peak usage and reach millions of connected clients with real-time notifications. Whenever a client invokes a GraphQL subscription operation, a secure WebSocket connection is automatically established and managed by AppSync and remains constantly connected allowing users to receive real-time data from any supported AppSync data source. For example, the satellite television service Sky Italia wanted to deliver a better sports fan experience by pushing real-time data updates during live broadcasts of sports events and, by using AWS AppSync, they were able to optimize data transfers during peak traffic times and provide sports updates to viewers in milliseconds.

Taking advantage of GraphQL subscriptions to perform real-time operations, AppSync can push data to clients that choose to listen to specific events from the backend. This means that you can easily and effortlessly make any supported data source in AppSync real-time with connection management handled automatically between the client and the service. A backend service can easily broadcast data to connected clients or clients can send data to other clients, depending on the use case. Real-time data, WebSocket connections, scalability, fan-out and broadcasting are all handled by intelligent client libraries and AppSync, allowing you to focus on your application business use cases and requirements instead of dealing with the complex infrastructure to manage WebSockets connections at scale.

The published AWS solution uses AWS AppSync with other services such as AWS Lambda, Amazon DynamoDB, Amazon Kinesis Data Streams, and Amazon Pinpoint to deliver a highly scalable serverless architecture for this use case. You can access the implementation guide in the solution’s site and deploy it in your AWS account. In this article, we explain the architecture in detail, discuss design considerations and walk through the GraphQL schema.

Architecture overview

This real-time solution is built based on several AWS CloudFormation stacks, each with a specific purpose. The main stack handles data pre-processing and real-time delivery of data, two ingestion stacks can be optionally provisioned to handle data ingestion from a data provider, an optional notification stack to send data to offline end users via push notifications, and a simulation stack to demonstrate how the solution works and how to test customizations against a set of simulated games.

Main stack

Data from a data feed provider is ingested into Amazon Kinesis Data Streams configured in the main stack. Data can be ingested through either a dedicated producer app or by leveraging the optional Ingestion Stack (more about ingestion in the next section). Once the data is ingested into Kinesis Data Streams, a Lambda function transforms and enriches the data using configuration information provided by the Amazon DynamoDB. The Lambda function provides the ability to adapt data records received from a third-party data provider to the expected GraphQL schema type. This approach enables media customers, such as broadcasters, to use the same GraphQL schema regardless the sports or the feed provider data we want to distribute. Providers can distribute data to business customers via an API that is standard across several sports. In addition to that, a media company may use the function to enhance the feed with additional processed data.

The Lambda function calls AppSync to invoke a GraphQL mutation to save game events data to DynamoDB tables. Once the mutation is completed, AppSync notifies subscribers in real-time about a new event. The real-time message is delivered via secure WebSockets, as described in the documentation.

Ingestion stacks

The architecture has two optional Ingestion Stacks to address different use cases related to the ingestion of data into the system. While the Main Stack enables a Kinesis Data Streams to manage data ingestion, the optional Ingestion Stacks can use both a REST API provided by Amazon API Gateway and an AWS Step Functions workflow. The REST API can be used by a data provider to push data into the media company system, while the AWS Step Functions workflow can be used by a media company to pull data from the data provider API.

In both cases data is ingested into the Amazon Kinesis Data Stream deployed in the Main Stack.

Notification stacks

Optionally, this solution can be connected to an Amazon Pinpoint project using another Lambda function to send notifications. This is an approach customers can use to notify via push notification all end users that are not connected to AppSync, which is the case when a user doesn’t have the mobile app running in the foreground. This approach enables Media and Entertainment customers to notify offline users about important events such as the game start, finish, or a score event.

Simulation stack

When the CloudFormation template is deployed, a simulated data ingestion process is automatically executed by default to demonstrate the capabilities of the solution. Also, the simulated data ingestion demo can be leveraged as a continuous testing tool for frontend development, providing a recurring and continuous flow of data that can be used to test changes to the frontend client. The stack provides a static web application hosted on an Amazon S3 bucket and served by an Amazon CloudFront distribution. The web application is based on React and uses AWS Amplify libraries to interact with AppSync to query current games and subscribe to game events for real-time notifications.

AWS Amplify is a set of tools and services that enables mobile and frontend web developers to build secure and scalable fullstack applications. The Amplify Framework consists libraries, UI components, and a CLI toolchain. Organized by use case, Amplify libraries and UI components are powered by AWS services. Amplify is open source and support different front-end frameworks (React, React Native, Flutter, Angular, Vue, Ionic) as well as native iOS/Android applications.

Design walkthrough

The following diagram shows the data flow for the different use cases we want to address with this solution. The idea behind the design is that these scenarios are independent of each other and components needed for these different use cases can be deployed at the same time, providing the ability to mix and match different approaches based on the characteristics of your data, the event, and your data provider.

1. Data stream: Kinesis handles the ingestion of data for live game events. Game events data is ingested into the Kinesis Data Stream from any data producer you may want to build. Based on the volume of real-time data, you should carefully select the right number of shards for the stream. If you select a multi shard approach, we suggest to use the feedID or the providerID (or a mix of the two) as partition key. FeedID is an identifier for the specific feed that is ingested, while the providerID is the identifier for the data provider. Using this strategy allows to split data related to different sports, competitions, or coming from different providers in different shards. Other than that, by default, the enhanced fan-out option is enabled.

Direct ingestion into Kinesis is an option you can use either if you are the provider of game data or if your data provider is allowed to perform PutRecord calls into the Kinesis Stream. If you are not able to directly interact with Kinesis by writing a producer app, you use other options:

Scenario 1a: an Amazon API Gateway REST API, using an HTTP POST request, which is protected by IAM authorization. Data is ingested into Kinesis Data Stream using the concatenation of providerID and feedID as Partition Key, therefore both will be required parameters. This scenario enables you, or your data provider, to control the frequency the data must be ingested and to control the latency between live event actions and data ingestion. This is an option for the following use cases:
- If you are the provider of game data and you are not authorized to ingest data directly into Kinesis
- If your data provider is not authorized to perform PutRecord calls into Kinesis but can invoke an API to send data.
Scenario 1b: a Step Functions workflow polling external third-party APIs exposed by your data provider. This workflow leverages Lambda to get today’s games details and pull data from the provider API with a specific frequency you may want to manage (for example, to avoid throttling).

2. Pre-processing function: Game events are pre-processed by a Lambda function that can translate the feed into the expected input format for the GraphQL API. This Lambda function enriches, enhances, and transforms data to enable you to customize information presented to subscribers. For example, the ingested data feed may use the name San Francisco 49ers, but you can edit that name to fit a smaller frontend user interface (such as mobile device, table, etc.) such as SF or 49ers. The function also allows to transform the record coming from the ingested data into the standard GraphQL types defined in your API. This transformation is based on the configuration stored on a DynamoDB table. This approach allows you to keep the GraphQL schema as much general as possible, reinforcing standardization across different use cased, sports, or competitions. Other than that, you can use this Lambda function to filter game events received from a provider that don’t need to be sent to end users.

3. GraphQL API: Game events and status updates are sent by the Lambda function to AppSync by executing GraphQL mutations to save data in DynamoDB. These operations are performed by AppSync VTL resolvers and once completed, AppSync notifies in real-time clients subscribed to the mutations. The GraphQL API is secured with two authorization modes: AWS Identity and Access Management (IAM) and API keys. IAM authentication protects GraphQL mutations, allowing only specific execution roles (assumed by the Lambda functions) to perform those mutations. API keys allow queries and subscriptions on games and events data to be automatically pushed to frontend applications receiving updates to all types of users, including guests. API keys are useful for use cases such as free sports mobile applications. All other queries are protected with IAM authentication.

4. Push notifications: A DynamoDB Stream with all modified data is consumed by a Lambda function that interacts with Pinpoint to create campaigns and sends push notifications in order to notify all users that are not connected with AppSync (for example: users with a mobile app in background). This Lambda function can be configured to select the events that should be sent via push notifications, instead of WebSockets.

5. Game simulation (optional): Optionally customers can test the whole solution by deploying a simulation stack. Simulated game events are ingested using the Step Functions workflow. Each simulated game executes the simulation workflow triggered by a Simulated Game Scheduler Lambda function that runs based on a schedule (cron expression).

You can find more details in the solution implementation guide.

GraphQL schema design

The GraphQL schema is designed to provide a generic approach to deliver sports data. It’s based on games with home and away competitors, so it’s suitable for delivering real-time sports updates when two competitors are playing (teams or single players). For example, sports like soccer, football, basketball, volleyball, hockey, baseball, and tennis.

The schema defines a set of GraphQL types that can be accessed by different authorization methods: IAM using the @aws_iam directive and API keys with @aws_api_key directive, with additional authorization rules implemented for specific fields or operations.

The Sports type defines the concept of a sport (soccer, football, etc.). This type enables to define a FeedConfig and a list of competitions with the ModelCompetitionConnection.

type Sport @aws_iam @aws_api_key {
  id: ID!
  name: String!
  competitions(filter: ModelCompetitionFilterInput, sortDirection: ModelSortDirection, limit: Int, nextToken: String): ModelCompetitionConnection
  createdAt: AWSDateTime!
  updatedAt: AWSDateTime!
  feedConfig: FeedConfig @aws_iam
}

type ModelCompetitionConnection @aws_api_key @aws_iam {
  items: [Competition]
  nextToken: String
}

The FeedConfig type models the configuration of a feed from a third-party, including the url field that contains the link to pull data from the provider.

type FeedConfig @aws_iam @aws_api_key {
  providerId: String
  feedId: String
  url: String
  type: String
  operation: String
  providerkey: String
}

Using this type, you can set different levels of configuration for third-party data feeds and sports data providers, for one or more game event types including Sports, Competition, Season, Stage, and Game. These options provide great flexibility when managing the feed configuration.

The feedConfig field in the Sport type is protected with @aws_iam, hence it’s not accessible via @aws_api_key. The idea is that only internal services, such as Lambda, can access this field. This is managed via proper permissions based on an IAM execution role and the same approach is used for the feedConfig defined as Competition, Season, Stage and Game.

Just as Sport can have multiple competitions, a Competition can have multiple seasons and a Season can have multiple Stages. With all these GraphQL types it’s possible to model concepts and scenarios such as:

Sport: Soccer
Competition: Bundesliga
Season: 2020-2021
Stage: Week 1

The Game, GameStatus, and GameEvent types are used when you want to notify the end users in real-time. Real-time live updates for sports usually are related to score changes (such as goals in soccer or touchdowns in football) or to status changes (such as game start or half-time). Therefore, these are the types that usually are managed by the pre-processing Lambda function that consumes the ingested feed via Kinesis, as described in the step 2 of the design data flow.

The Game type identifies the concept of a game. A game is usually created well in advance based on a competition’s schedule, but the status changes in real-time are based on game’s events as they happen. Within the Game type, you can specify game parameters including venue, competitors, scoring drives, and stats. Stats is a field of type AWSJSON, which enables you to enter JSON-based statistical information for a game. For information about AWSJSON, AWSDateTime, and other AppSync defined scalar types, refer to the AppSync documentation.

type Game @aws_iam @aws_api_key {
  id: ID!
  stage: Stage
  stageId: ID!
  plannedKickoffTime: AWSDateTime!
  venue: Venue
  home: Competitor
  away: Competitor
  gameStatus: GameStatus
  scoringDrives: [ScoringDrive]
  events(createdAt: ModelStringKeyConditionInput, filter: ModelGameEventFilterInput, sortDirection: ModelSortDirection, limit: Int, nextToken: String): ModelGameEventConnection
  stats: AWSJSON
  createdAt: AWSDateTime!
  updatedAt: AWSDateTime!
  feedConfig: FeedConfig @aws_iam
}

When creating a game that has a FeedConfig that specifies a URL, the plannedKickoffTime field sets up the start time for polling the provider API via the Step Functions workflow.

A game has a gameStatus field based on the type GameStatus. The GameStatus type contains information about the current game’s scores, remaining, or consumed time, the winning team or player, and aggregate scores (for events containing multiple rounds or sporting events with a best-of series in a playoff round).

type GameStatus @aws_iam @aws_api_key {
  status: Status!
  clock: String
  clockStoppageAnnounced: String
  clockStoppagePlayer: String
  winner: Competitor
  aggregateAwayScore: Int
  aggregateHomeScore: Int
  aggregateWinner: Competitor
  awayNormaltimeScore: Int
  awayOvertimeScore: Int
  awayScore: Int
  homeNormaltimeScore: Int
  homeOvertimeScore: Int
  homeScore: Int
  possession: String
  location: String
  play: String
  sections: [GameSection]
}

A game also has a list of game events you can access via the events field of the type ModelGameEventConnection.

type ModelGameEventConnection {
  items: [GameEvent]
  nextToken: String
}

The GameEvent type manages real-time live updates for every event occurring in a game such as changes in the score, substitutions, play-by-play coverage, big plays, possessions changes, and more.

type GameEvent  @aws_iam @aws_api_key {
  id: ID!
  game: Game
  gameId: ID!
  type: String
  clock: String
  section: GameSection
  competitor: Competitor
  homeScore: Int
  awayScore: Int
  scorer: Player
  assist: Player
  playerIn: Player
  playerOut: Player
  commentary: String
  players: [Player]
  createdAt: AWSDateTime!
  updatedAt: AWSDateTime!
}

When a new event is detected in the ingested stream, the Lambda function that consumes the streams usually perform either a updateGame mutation or a createGameEvent mutation.

updateGame(input: UpdateGameInput!, condition: ModelGameConditionInput): Game @aws_iam

createGameEvent(input: CreateGameEventInput!, condition: ModelGameEventConditionInput): GameEvent @aws_iam

Both operations are protected via @aws_iam and only the Lambda function itself has permissions to perform these operations.

Leveraging GraphQL subscriptions, frontend clients can subscribe to specific mutations that trigger data change events defined in the GraphQL schema. For real-time events notifications, clients just need to subscribe to:

onUpdateGame: Game @aws_subscribe(mutations: ["updateGame"]) @aws_iam

onCreateGameEvent(gameId:ID): GameEvent @aws_subscribe(mutations: ["createGameEvent"])

The onCreateGameEvent subscription provides a filter based on the gameId, so clients are only notified about events related to a specific game. This is useful for addressing use cases such as real-time play-by-play as it’s implemented in the fake web portal provided by the simulation stack:

You can find the full schema in the solution’s public GitHub repository. It’s possible to customize the GraphQL schema file and redeploy the API stack to address different use cases such as very specific sports types or additional context for real-time updates (for example, real-time live coverage of events not related to sports).

Conclusion

In this post, we showcased how to leverage AWS AppSync real-time capabilities to address an important use case in the Media and Entertainment industry. Delivering real-time live updates is critical for companies that want to enhance sports engagement with viewers providing a great user experience. With the Real-Time Live Sports Updates solution, you can deploy a scalable serverless architecture in your own AWS account based on a reference architecture designed based on best practices. Best of all, you don’t need to manage servers or infrastructure to implement a scalable and reliable real-time backend. You can customize the solution and the code, and you can start building it now by accessing the solution page.

Stefano Sandrini is a Senior Specialist Solutions Architect at AWS. He likes to help AWS customers and partners creating their solutions. When not working, he can be found coding prototypes for next-gen apps, talking about sports with anyone or playing guitar.