AWS Architecture Blog
Field Notes: How Sportradar Accelerated Data Recovery Using AWS Services
This post was co-written by Mithil Prasad, AWS Senior Customer Solutions Manager, Patrick Gryczka, AWS Solutions Architect, Ben Burdsall, CTO at Sportradar and Justin Shreve, Director of Engineering at Sportradar.
Ransomware is a type of malware which encrypts data, effectively locking those affected by it out of their own data and requesting a payment to decrypt the data. The frequency of ransomware attacks has increased over the past year, with local governments, hospitals, and private companies experiencing cases of ransomware.
For Sportradar, providing their customers with access to high quality sports data and insights is central to their business. Ensuring that their systems are designed securely and in a way which minimizes the possibility of a ransomware attack is top priority. While ransomware attacks can occur both on premises and in the cloud, AWS services offer increased visibility and native encryption and back up capabilities. This helps prevent and minimize the likelihood and impact of a ransomware attack.
Recovery, backup, and the ability to go back to a known good state is best practice. To further expand their defense and diminish the value of ransom, the Sportradar architecture team set out to leverage their AWS Step Functions expertise to minimize recovery time. The team’s strategy centered on achieving a short deployment process. This process commoditized their production environment, allowing them to spin up interchangeable environments in new isolated AWS accounts, pulling in data from external and isolated sources, and diminishing the value of a production environment as a ransom target. This also minimized the impact of a potential data destruction event.
By partnering with AWS, Sportradar was able to build a secure and resilient infrastructure to provide timely recovery of their service in the event of data destruction by an unauthorized third party. Sportradar automated the deployment of their application to a new AWS account and established a new isolation boundary from an account with compromised resources. In this blog post, we show how the Sportradar architecture team used a combination of AWS CodePipeline and AWS Step Functions to automate and reduce their deployment time to less than two hours.
Solution Overview
Sportradar’s solution uses AWS Step Functions to orchestrate the deployment of resources, the recovery of data, and the deployment of application code, and to navigate all necessary dependencies for order of deployment. While deployment can be orchestrated through CodePipeline, Sportradar used their familiarity with Step Functions to create a quick and repeatable deployment process for their environment.
Sportradar’s solution to a ransomware Disaster Recovery scenario has also provided them with a reliable and accelerated process for deploying development and testing environments. Developers are now able to scale testing and development environments up and down as needed. This has allowed their Development and QA teams to follow the pace of feature development, versus weekly or bi-weekly feature release and testing schedules tied to a single testing environment.
Prerequisites
The prerequisites for implementing this deployment strategy are:
- An implemented database backup policy
- Ideally data should be backed up to a data bunker AWS account outside the scope of the environment you are looking to protect. This is so that in the event of a ransomware attack, your backed up data is isolated from your affected environment and account
- Application code within a GitHub repository
- Separation of duties
- Access and responsibility for the backups and GitHub repository should be separated to different stakeholders in order to reduce the likelihood of both being impacted by a security breach
Step 1: New Account Setup
Once data destruction is identified, the first step in Sportradar’s process is to use a pre-created runbook to create a new AWS account. A new account is created in case the malicious actors who have encrypted the application’s data have access to not just the application, but also to the AWS account the application resides in.
The runbook sets up a VPC for a selected Region, as well as spinning up the following resources:
- Security Groups with network connectivity to their git repository (in this case GitLab), IAM Roles for their resources
- KMS Keys
- Amazon S3 buckets with CloudFormation deployment templates
- CodeBuild, CodeDeploy, and CodePipeline
Step 2: Deploying Secrets
It is a security best practice to ensure that no secrets are hard coded into your application code. So, after account setup is complete, the new AWS accounts Access Keys and the selected AWS Region are passed into CodePipeline variables. The application secrets are then deployed to the AWS Parameter Store.
Step 3: Deploying Orchestrator Step Function and In-Memory Databases
To optimize deployment time, Sportradar decided to leave the deployment of their in-memory databases running on Amazon EC2 outside of their orchestrator Step Function. They deployed the database using a CloudFormation template from their CodePipeline. This was in parallel with the deployment of the Step Function, which orchestrates the rest of their deployment.
Step 4: Step Function Orchestrates the Deployment of Microservices and Alarms
The AWS Step Functions orchestrate the deployment of Sportradar’s microservices solutions, deploying 10+ Amazon RDS instances, and restoring each dataset from DB snapshots. Following that, 80+ producer Amazon SQS queues and S3 buckets for data staging were deployed. After the successful deployment of the SQS queues, the Lambda functions for data ingestion and 15+ data processing Step Functions are deployed to begin pulling in data from various sources into the solution.
Then the API Gateways and Lambda functions which provide the API layer for each of the microservices are deployed in front of the restored RDS instances. Finally, 300+ Amazon CloudWatch Alarms are created to monitor the environment and trigger necessary alerts. In total Sportradar’s deployment process brings online: 15+ Step Functions for data processing, 30+ micro-services, 10+ Amazon RDS instances with over 150GB of data, 80+ SQS Queues, 180+ Lambda functions, CDN for UI, Amazon Elasticache, and 300+ CloudWatch alarms to monitor the applications. In all, that is over 600 resources deployed with data restored consistently in less than 2 hours total.
Conclusion
In this blog, we showed how Sportradar’s team used Step Functions to accelerate their deployments, and a walk-through of an example disaster recovery scenario. Step Functions can be used to orchestrate the deployment and configuration of a new environment, allowing complex environments to be deployed in stages, and for those stages to appropriately wait on their dependencies.
For examples of Step Functions being used in different orchestration scenarios, check out how Step Functions acts as an orchestrator for ETLs in Orchestrate multiple ETL jobs using AWS Step Functions and AWS Lambda and Orchestrate Apache Spark applications using AWS Step Functions and Apache Livy. For migrations of Amazon EC2 based workloads, read more about CloudEndure, Migrating workloads across AWS Regions with CloudEndure Migration.
Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.