FORMULA 1 transfers race car data into AWS using AWS DataSync
As the Cloud Team Lead at FORMULA 1, I am responsible for defining, designing, and implementing workloads across the cloud estate, including security, platform, project, networking, and governance. One day, the question arose internally around how we could get our data off the FORMULA 1 race cars and back to our remote data analysts’ systems more quickly and efficiently. Doing so would enable us to minimize downtime for analyst insights on performance, in turn helping us improve things like track safety and broadcast statistics.
Usually we accomplished this data migration overnight using large data transfers from the previous day, which would often be problematic if the transfer failed. When a car arrives into its team garage after an F1 session, the race team immediately connects it to an umbilical, a cable loom that transfers car data over a low latency local and wide area network to our Remote Technical Centre. By attaching the cars umbilical and sending the data streams directly into FORMULA 1’s AWS data lake, we can get a head start on data analytics and post-processing. Combining AWS DataSync with other AWS services enabled us to do this and achieve near instant car-to-cloud data transfers.
In this blog post, we cover how FORMULA 1 used AWS to implement a data transfer solution fitting a race car organization, bringing about speed and efficiency from racecars to post-processing of data.
Before we adopted the cloud, we ran most of our workloads from our Technical Centre, which physically traveled to every event around the globe and ran most of our media broadcast. The Technical Centre was a combination of our now track-based Event Technical Centre, and our now UK-based Remote Technical Centre. The data would live on these track-local servers, which we would then transfer overnight due to the large size of these files. This would then provide data to our analysts when they came into work the following day. Fast-forward to when we chose AWS as our cloud platform; we started to look into ways in which we could break down these processes for greater efficiency and speed. AWS DataSync was our solution for previous data migrations, which were upwards of ~400 TB, to act as a conduit between our Media & Technology Centre and AWS. We chose AWS DataSync due to the high performance of its transfers directly into Amazon S3, which no other system could meet.
We started designing the solution for the data transfers, which consisted of continuous and dedicated connectivity between our Event Technical Centre, Remote Technical Centre, Media & Technology Centre, and then finally AWS. This dedicated connectivity we got using AWS Direct Connect enabled us to have the underlying bandwidth for the large data transfers. We configured an AWS DataSync agent within our Remote Technical Centre to handle the communication between DataSync and the file source. To ensure that the car data transfer task executed within AWS on a regular basis, we used an AWS Lambda function to start the DataSync job. We combined this with an Amazon CloudWatch Event rule that a cron job triggers on a one-minute interval. After each track session, the race teams connect each car via the umbilical and the data is quickly and seamlessly transferred via DataSync straight into the FORMULA 1 data lake. Having this near instant access to the data allowed our analysts to start interpreting the data right away. This is important as our analysts and engineers must be able to see if there are any issues with the race car’s data systems. Our FORMULA 1 analysts are now able to triage any issues with significantly less lead-time. Refer to Figure 1 for a visual diagram of this flow:
Figure 1: FORMULA 1 car-to-cloud flow using AWS DataSync
The utilization of our dedicated connectivity through AWS Direct Connect’s private virtual interfaces, combined with AWS DataSync, enabled the flow to work with high speed and efficiency.
The AWS Lambda function was a simple Python function that called the DataSync API to start the task. We defined the DataSync API as an environment variable that passed into the function, so the function knows which AWS DataSync task to trigger. See the following function:
import boto3 From botocore.exceptions import ClientError import json client = boto3.client('datasync') Task = os.environ[‘TASK_ARN’] def lambda_handler (event, context): response = client.start_task_execution( TaskArn = Task )
- Aggressive speeds into Amazon S3; we tested speeds upwards of approximately 930 Mb/s from a 1-GbE physical network interface (NIC). We needed to have this speed guarantee to ensure that we could transfer the data into AWS to meet our near-real time goal.
- We were able to set multiple tasks, within multiple timeframes, in which we went a step further and added a Lambda function for extra task execution throughput. This allowed us to control which tasks we wanted to execute and also gave us flexibility around our transfers.
- DataSync meets our cybersecurity standards by using Transport Layer Security (TLS) for all data transfers between the agent and AWS while also supporting SSE-S3 for encryption on our S3 bucket. Security is hugely important for us, especially due to the nature of having so much bespoke data. That being said, we were happy to see that we could make use of various encryption mechanisms.
- Flexibility around file requirements when it comes to transfer, for example, file types, whether DataSync retains all data or copies over deletes etc. This was beneficial, as we have multiple file types existing within the servers file directories, so we could narrow down the data to exactly what we wanted.
- Pay only for data transferred (per GB) and data stored within Amazon S3. We could easily generate cost predictions due to the simple price model of AWS DataSync.
- Huge amounts of time saved making the data accessible to our Media & Technology Centre in less than an hour, which was historically only available the following day. As mentioned earlier in this post, we could triage any potential issues and get the data back to the engineers and analysts in a very quick turnaround.
From designing our system to the actual implementation, we learned quite a few lessons. One lesson is that the single AWS DataSync agent can only run a single task concurrently. This was an important lesson, as we would need multiple agents should the requirement of multiple data streams arise. AWS DataSync is a powerful service that is great for fast transfers. We limited the transfer rate to ensure that we did not affect our other applications. AWS DataSync has a minimum built-in task frequency of one execution per hour. We built a custom design using Amazon CloudWatch and AWS Lambda to ensure that we could run the task at a frequency of one execution per minute. Overall, we learned that moving large amounts of data into Amazon S3 was significantly easier than we originally thought, through the use of AWS DataSync and other AWS services.
Before we used AWS DataSync as a method of transferring data in a highly performant way, we experienced issues using an old school method involving overnight backups. Our traditional method had the potential to prove problematic, should an interruption disrupt the transfer overnight. We realized that using a hybrid cloud setup, using AWS DataSync, would give us the speed to carry out these transfers in near-real time. Ultimately, our desired solution involved optimizing out our data transfer tasks to take no more than a single hour, and we did so by using Amazon CloudWatch and AWS Lambda to help us distribute our AWS DataSync tasks to meet that time requirement.
The significance of having data in real time is paramount for our engineers and analysts, as operational efficiency is of the utmost importance for our business. The benefits of high-speed transfers into AWS bring this system in line with FORMULA 1’s expectation and business model. Should a scenario with similar requirements arise, AWS DataSync’s high-speed capabilities makes it the clear solution.
Thanks for reading! I hope this blog provided some good insights on how you can use AWS DataSync to simplify, automate, and accelerate moving data to and from AWS Storage. Make sure to check out our other blog post on the AWS Storage blog by my colleague Martynas Juras, where he discusses how FORMULA 1 uses AWS DataSync and AWS Storage Gateway for backup and archiving. If you have any comments or questions, please don’t hesitate to leave them in the comments section.
The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.