AWS Storage Blog

How FORMULA 1 uses AWS DataSync and AWS Storage Gateway for backup and archiving

Hi there. My name is Martynas Juras and I am a Cloud Engineer for FORMULA 1. FORMULA 1 is one of the world’s most recognizable sports series, reaching every corner of the globe. We showcase FORMULA 1 racing around the world via our own in-house broadcast. As we produce our own broadcast, we have to securely store a large amount of data, which is forever increasing. The data we store covers raw video footage, car telemetry, race timing data, SQL databases, corporate user emails, and corporate user and team files. With vast amounts of data of different types, we must back up our data and be prepared for any failures, either out at the race track or back at the office.

In this blog post, we focus on how we used AWS Storage Gateway to replace our physical tape backup infrastructure. I also cover how we used AWS DataSync to implement a disaster recovery (DR) solution for our raw video archive. We highlight the challenges we faced with our previous setup, and focus on the automation processes we implemented to make the end-to-end workflow more efficient.

FORMULA 1’s Tape Gateway backup solution

Before AWS, we used physical tape infrastructure for storing our data long term. Our physical tape infrastructure was comprised of a traditional on-premises tape backup solution. It consisted of an HP tape storage library, HP LTO 6 tapes, and a physical backup server running Microsoft Data Protection Manager (DPM) on Windows Server 2016. Ultimately, we had physical tapes shipped off premises to a third-party storage facility.

We faced the following challenges with our tape library before using AWS:

  • Recovery times: When our IT Service Desk must perform an audit recoverability test or recover a file that was no longer stored on disk, they would need to request the physical tape. The request process would require an email to a third party asking for the tapes, and then our onsite driver would pick them up from the storage location. The complete process for requesting a tape, retrieving a tape, performing the recovery, and then sending the tape back off site would take 5–7 days.
  • Tape storage: We needed to have secure office space to store the completed weekly and monthly tapes. We also needed to ensure we had extra space for any newly created tapes. For FORMULA 1, this meant that the tapes needed to be stored in an actual safe or vault, and then were moved to a secure Pelican case for transport.
  • Tape management: We are required to maintain a minimum number of tapes for our library. We consistently had to purchase new tapes as the amount of data that we back up is always increasing. A member of the IT Service Desk would also regularly need to maintain the number of free tapes in the library for that week’s backup job. Moreover, a member of the IT Service Desk would have to load the recovered tape into the library for a restore.

Our AWS solution for our tape library

Choosing AWS allowed us to implement a solution to tackle the preceding challenges. AWS already provides a proven, ready-made service in the form of AWS Storage Gateway’s Tape Gateway service, which supports Microsoft DPM. As we backup more than 50 TB of data each week, we must leverage our AWS Direct Connect setup to AWS to allow us to transfer that amount of data.

Using Tape Gateway, we were able to set up new virtual tape libraries (VTLs) on our Microsoft Hyper-V platform using iSCSI for the connection from DPM to the VTLs. Then, it was a simple update to the protection groups in DPM to point to the required VTL instead of to the physical tape library.

How Formula 1 used AWS Storage Gateway to set up virtual tape libraries on S3 rather than having them on-prem

Once we moved the current protection groups to use the VTL, we created a process that allows us to automate the tape management. As mentioned earlier with the physical tape infrastructure, members of the IT Service Desk team would have to perform a number of manual tasks. For example, they would have to take the offsite-ready tapes out of the library and replace them with new tapes. Tape Gateway provides us APIs to create new virtual tapes on-demand and we don’t have to pay any fees until we consume storage space on those tapes.

We implemented an AWS Lambda function that created new tapes for each library that exists in the account. We created a PowerShell script for the DPM side, which removed every tape from each library marked as ‘Offsite Ready.’ After removing these tapes, the script adds the new tapes created by the Lambda.

Lambda triggered by Amazon CloudWatch cron job:

import json
import boto3
import datetime
import os

Tape_Size = os.environ['Tape_Size']
Tape_Num = os.environ['Tape_Num']
sgw = boto3.client('storagegateway')

def lambda_handler(event, context):
    date_object = datetime.datetime.now()
    list_gw = sgw.list_gateways()
    gws = [ list['GatewayARN'] for list in list_gw['Gateways'] ]
    for gateway in gws:
        New_Tapes = sgw.create_tapes(
        GatewayARN=gateway,
        TapeSizeInBytes=int(Tape_Size),
        ClientToken=str(date_object),
        NumTapesToCreate=int(Tape_Num),
        TapeBarcodePrefix='FOM',
        PoolId='DEEP_ARCHIVE',
        )
        for arn in New_Tapes['TapeARNs']:
            Tape_ID = arn.split('/')[1]
            print ('Tape:', Tape_ID, 'was created in SGW:', gateway)

PowerShell script triggered by a Windows task scheduler job:

Import-Module DataProtectionManager
function Get-TimeStamp {    
    return "[{0:MM/dd/yy} {0:HH:mm:ss}]" -f (Get-Date)    
}
# Delete log file when older than 30 day(s)
$Path = "C:\temp\refreshtapeslog.txt"
$Daysback = "-30" 
$CurrentDate = Get-Date
$DatetoDelete = $CurrentDate.AddDays($Daysback)
Get-ChildItem $Path | Where-Object { $_.CreationTime -lt $DatetoDelete } | Remove-Item -ErrorAction SilentlyContinue -Recurse -Force
$DPMServer=$env:COMPUTERNAME
Try
{
$DpmLibrary = Get-DPMLibrary -DPMServerName $DPMServer | Where-Object Name -like "Library: BH-AWS-VTL*" | Where-Object Name -Like "*(Disabled)"
Write-Output "$(Get-TimeStamp) Connected to $DPMServer" | Out-file C:\temp\refreshtapeslog.txt -append
Write-Output "$(Get-TimeStamp) Libraries to refresh: " | Out-file C:\temp\refreshtapeslog.txt -append
Write-Output $DpmLibrary.UserFriendlyName | Out-file C:\temp\refreshtapeslog.txt -append
}
Catch
{
Write-Output "$(Get-TimeStamp) Error getting libraries: $_" | Out-file C:\temp\refreshtapeslog.txt -append
}
ForEach ($library in $DpmLibrary) {
$dpmtape = Get-DPMTape -DPMLibrary $library | Where-Object -Property IsOffsiteReady -eq true
Try
{
if ($dpmtape.count -gt "0"){
Write-Output "$(Get-TimeStamp) Tapes to remove from library:" $li-brary.UserFriendlyName  | Out-file C:\temp\refreshtapeslog.txt -append
Write-Output $dpmtape.MediaLabel | Out-file C:\temp\refreshtapeslog.txt -append
Unlock-DPMLibraryDoor -DPMLibrary $library -Confirm:$false
Remove-DPMTape -DPMLibrary $library -Tape $dpmtape -Confirm:$false 
}
else {
    Write-Output "$(Get-TimeStamp)No tapes to remove" | Out-file C:\temp\refreshtapeslog.txt -append
    }
}
catch
{
Write-Output "$(Get-TimeStamp) Error getting Tapes: $_" | Out-file C:\temp\refreshtapeslog.txt -append
} 
Try
{
Add-DPMTape -DPMLibrary $library
$Slots=Get-DPMTape -DPMLibrary $library | where MediaLabel -eq "Unknown" | select Location
if ($Slots.count -gt "1") {
    foreach ($Slot in $Slots) {
        ForceFree-Tape.ps1 -DPMServerName $DPMServer -LibraryName $li-brary.userfriendlyname -TapeLocationList $Slot.Location | Out-file C:\temp\refreshtapeslog.txt -append
    }} else {
        Write-Output "$(Get-TimeStamp) No new tapes to add" | Out-file C:\temp\refreshtapeslog.txt -append
    }
}
Catch
{
Write-Output "$(Get-TimeStamp) Error getting Adding/Removing tapes: $_" | Out-file C:\temp\refreshtapeslog.txt -append
}
}

One of the useful features of the Tape Gateway service is when your backup application releases the tape, the tape is automatically moved to Amazon S3 Glacier or Amazon S3 Glacier Deep Archive. For us, it made sense to go straight to S3 Glacier Deep Archive due to the significant cost savings that this storage class offers.

Tape Gateway uses iSCSI to connect with the backup application compared to Fibre Channel connectivity used by our physical tape library. This means we had to make a few one-time changes to our DPM setup to achieve similar performance as that of our physical tape library. First, we changed our DPM protection groups to use multiple VTLs. Since VTLs asynchronously upload data to the cloud, using multiple VTLs ensured we were not filling up the upload buffer on a single VTL and subsequently slowing the backup. We also scheduled jobs over the week instead of in a narrow window. Scheduling jobs in this manner limited the number of drives a protection group could use to two. It also reorganized our protection groups so that we no longer had multiple large volumes to back up. These changes helped us optimize Tape Gateway performance and meet our backup requirements.

Key Stats:

  • Average data transferred per week to AWS using Tape Gateway: 56 TB
  • Average transfer speeds to/from VTL to AWS per gateway: 125 MB/s
  • Average application speeds to VTL from DPM per job: 225 MB/s
  • Average restoration times: dependent on tape size, retrieval takes between 3-5 hours

FORMULA 1’s AWS DataSync DR solution

We used Tape Gateway as a direct replacement for our physical tape archival solution. We did this with the aim of keeping race and user data backed up to tape. We also wanted a solution for backing up certain data types, which we could recover quickly while still optimizing costs. For our video archive, we decided to use AWS DataSync to sync the raw video footage from our near line NFS storage to Amazon S3. AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS Storage services, in addition to between AWS Storage services. Our setup before AWS was to back up to tapes and send the tapes offsite as we did for the rest of FORMULA 1’s tape data.

We faced the following challenges for storing video archives before using AWS:

  • Offsite storage: Must be capable of keeping the raw footage backed up offsite.
  • Reliability: Must be able to have confidence in the recovery of the data.
  • Speed: Must be able to get the data back quickly and for users to have access to the footage.

Our AWS solution for storing video archives

We chose AWS DataSync as it allows us to sync our current data, totaling 400 TB, to Amazon S3 in a way we could control and it enables us to keep any new data in sync. We also set up File Gateway, which points to the S3 bucket that contains the footage and presents the data back to end users via a folder location that could then map to any device in our offices. By using DataSync in combination with File Gateway, we are able to provide end users with direct access to the raw footage in the event of any failures on premises.

F1 Chose AWS DataSync as it allows them to sync their current data, totaling 400 TB, to Amazon S3

The settings and controls provided by DataSync allow us to control when and how much data we want to sync without harming any other workloads. As users access our on-premises NFS storage during the day for other workflows, we had to tweak the bandwidth throttling setting a number of times to provide the correct balance. In the end, we were able to move 400 TB of data over 3 months.

Key Stats:

  • Data transferred between our offices and AWS: 400 TB over 3 months, on average between 4 TB – 4.5 TB per day
  • Security: Data is encrypted at rest in the S3 bucket and access to DataSync by users is limited to a few admins by IAM permissions
  • Access: Can provide on-premises users access to storage in the cloud almost immediately in an event of on-premises failure

Conclusion

In this blog post, we discussed how we at FORMULA 1 used both AWS DataSync and AWS Storage Gateway services to provide DR for various data streams. With AWS DataSync we were able to provide DR by syncing data from our on-premises file server to Amazon S3 at an average of 4 TB per day, and did so with encryption during transfer and at rest. We also talked about how we leveraged File Gateway to provide access to this data for our users when they are not on the race network and when the data is not available due to a failure.

With AWS Storage Gateway, we were able to move tape backups to the cloud allowing us to improve the management of tapes and backup jobs. Most importantly, we were able to reduce the number of physical kits that needed support in addition to limiting the need for third-party maintenance. Using AWS Storage Gateway, we were able to automate the creation of tapes, along with automating adding and removing tapes in our physical tape library. Our implemented solution frees up a substantial amount of our service desk team’s time to focus on important user tickets. Along with automating tape management, we were able to reduce recovery times from 5-7 days to just 1 day.

I hope this blog post has been insightful for you. If you have any comments or questions, please don’t hesitate to leave them in the comments section. Finally, take a look at the following assets to see additional ways FORMULA 1 uses AWS to accelerate cloud transformation and power FORMULA 1 Insights:

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.