AWS Partner Network (APN) Blog
How Metal Toad Uses Machine Learning to Keep a Top Comic Site Safe for San Diego Comic-Con
By Joaquin Lippincott, CEO – Metal Toad
Metal Toad |
Superheroes may be up all night foiling supervillains, but who is watching over their websites and protecting them from online evildoers? Metal Toad is proud to take on this responsibility with the help of a powerful “utility belt” delivered by Amazon Web Services (AWS).
Metal Toad has been working with major entertainment brands for decades, including keeping some of the highest-profile media sites live under unique traffic conditions. A major awards site, for example, may have strong traffic year-round, but on the night of a big event their ordinary traffic can look more like a distributed denial of service (DDoS) incident.
Keeping these sites up and running during harrowing conditions is one of Metal Toad’s superpowers, but we couldn’t do it without the right infrastructure and tools. As an AWS Advanced Tier Services Partner with the Digital Customer Experience Competency, Metal Toad specializes in leveraging these tools to accomplish the seemingly impossible.
One of the superpowers we leverage is machine learning, and (ML) we recently used AWS’s powerful ML tools to help the beloved comic book brand behind numerous household-name superheroes.
The Heroes of the Plot
Our client’s website gets heavy traffic throughout the year, but when the massive international Comic-Con event comes to San Diego each year, their site becomes the epicenter of a massive spike in traffic.
This is a great opportunity from a marketing perspective, but it also presents performance risks and makes the site a prime target.
Here, I’ll discuss some of the strategies Metal Toad deployed to protect this vital digital property during an event where failure is not an option. A site like this simply cannot go down, especially during an iconic national event that has the potential to impact revenue and business development year-round.
The Backstory
Metal Toad has had the opportunity to work with this client’s team for more than 10 years, beginning with multi-site Drupal content management system (CMS) installations and heavy customization of that framework to suit their unique business case, traffic levels, and security.
Metal Toad ultimately brought together four separate sites under a single CMS that greatly streamlined site management across the client’s comic empire. We also provided a customized admin experience with an eye on future flexibility and scalability.
Since this initial engagement, the client’s team has relied on Metal Toad to develop and continually improve upon their entire web ecosystem. When it came time to choose a partner for site management on AWS, Metal Toad was the logical choice.
Enter the Villains: Malicious Traffic and Bots
Every website today is plagued with malicious traffic and various types of bots seeking out vulnerabilities that can be exploited. According to CPO Magazine, “Bot traffic made up 42.3% of all internet activity in 2021, up from 40.8% in 2020.”
CPO also points out that “Bad bot traffic is nearly double that of the so-called ‘good bots’ that perform legitimate functions such as indexing and automated responses.”
Defeating a swarm of malicious robots is all in a day’s work for the Metal Toad team. The real trick is knowing “who is who” when all you have is an IP address and you’re serving thousands of requests per minute.
In this case, the client’s site serves up to two million requests per day under normal load (23 per second). During events like San Diego Comic-Con, that number can be up to 4x higher, or nearly 100 requests per second.
This increased traffic can adversely affect site performance. To make matters worse, the site is routinely crawled by third parties looking for security vulnerabilities or new leaks ahead of announcements.
Sorting through this flood to identify malicious traffic is a tough task to accomplish at the best of times, and extremely difficult for mere mortals to accomplish in anything near real time.
Of course, this elevated traffic period is the one time each year when absolutely nothing can be allowed to go wrong for Metal Toad’s client.
Figure 1 – Diagram of ML log monitoring application.
Metal Toad’s Utility Belt
Like many superheroes, the Metal Toad team needs epic tools to help us defeat evildoers. As an AWS Partner, we leverage every tool in our belt to help us succeed.
AWS Lambda: Providing Our Base of Operations
To prepare our data for examination, we first had to put it somewhere and get it properly formatted. AWS Lambda provides serverless architecture that allowed us to rapidly host our data without needing to provision and manage individual servers, and it scaled instantly to the size of our data set.
We also liked Lambda as a solution because we only incurred costs for the exact amount of data processing and storage we used. Given the rapid and transient traffic spikes involved with a major media property like this one, this amounts to significant savings over time.
The tight integration of AWS components was vital here. By using AWS CloudFront as content delivery network (CDN) and Amazon Simple Storage Service (Amazon S3) for storage, we were able to more easily spin up the relevant components and get them working together seamlessly.
We started by setting up a data pipeline. This used an Amazon Simple Notification Service (SNS) trigger to start a Lambda job every time CloudFront’s logs were stored in S3. The Lambda function parsed the log and removed the unneeded columns, preparing the data for evaluation and keeping our storage usage as efficient as possible.
The Training Montage: Amazon SageMaker and IP Insights
Next, Metal Toad data scientists looked at the data available, algorithms, and features we would need. They quickly identified Amazon SageMaker as the primary tool space, and the specialized IP Insights algorithm as the best fit for the job.
Amazon SageMaker is a cloud-based artificial intelligence (AI) development platform which provides a consolidated, build-to-train-to-production flow. It includes pre-built “notebooks,” essentially the integrated development environment (IDE) of the machine learning space, as well as multiple built-in high-performance algorithms.
SageMaker allowed us to rapidly get into the training process with a one-click deployment and fully managed, auto-scaling hosting.
IP Insights is a tool within the SageMaker ecosystem that allows historical data to be ingested and then analyzed as to usage patterns. This was the perfect tool for our scenario, as we knew that isolating specific IP addresses correlated with suspicious behavior would allow us to proactively block problem traffic while allowing a huge volume of valid traffic to pass through.
IP Insights works from an event model; when queried with an event in the form (entity, IPv4 address), the model returns a score that indicates how anomalous this event is.
We took the data from several days of logs and parsed it into test and training data, which we then fed into SageMaker IP Insights.
To train the model, Metal Toad data scientists grabbed several hours’ worth of traffic data and divided it into test and training sets. Using SageMaker, we created a Jupyter Notebook that processed the raw log files, divided the dataset, and provided a fully trained model.
After training the model, we could begin assessing new data. We set up a SageMaker endpoint and updated AWS Lambda to send the log file to the endpoint, and store the results in Amazon DynamoDB for evaluation.
The ML log monitoring solution quickly found two groups of IPs for evaluation:
- We felt the first set was obviously malicious based on just looking at the query parameters. This data was just scratching the surface, and could probably have been handled by tuning the web application firewall (WAF) on CloudFront better. In other words, the first pass got us some results but did not fully show the value of the AI.
- The second find was better and really showed the ability of the IP Insights tech to recognize patterns that can be missed by humans. Metal Toad’s data scientist originally thought these second groups might be a false positive, but upon deeper analysis of the IP addresses they found that IP Insights was indeed flagging problem traffic that could have been missed. We also found a few gems, including a few WhiteHat scanning companies.
After completing this analysis, the Metal Toad team felt that we had sufficient data and training complete that we were ready for the big battle.
Figure 2 – Output and analytics from our application.
The Battle Royale: Comic-Con Arrives
As comic fans flooded the event venue in full costume, along with movie stars and key executives from all of the major platforms, traffic began pouring in to the website.
During the four days of the event, Metal Toad’s application ingested and processed approximately 16 million log messages. From those, our highly-trained SageMaker IP Insights algorithm was able to spot 10,000 suspicious requests.
To put that in perspective as a classic “needle in a haystack” problem, suspicious traffic made up only 0.06% of the total traffic. This is roughly the equivalent of catching 10 suspicious people in a 20,000-seat stadium in the middle of a rock concert.
IP Insights performed amazingly well, applying the training we had done and isolating and blocking malicious traffic in real time throughout the event.
Between our super-powered Metal Toad team and the world-class tools provided by AWS, we were able to foil the forces of evil and the site stayed up, responsive, and secure throughout the event.
Lessons Learned
Metal Toad’s team learned several key lessons on this project:
#1: This is Completely Doable!
With the help of AWS tools, this actually wasn’t that hard. We are confident that all high-profile sites will use these tools in the future, especially around critical events where downtime or poor performance cannot be tolerated.
We would highly recommend this toolset and strategy to any major brand that needs to defend its web properties. With the components outlined above and a moderate investment of time and energy, you can greatly increase the security and performance of your site.
#2: Do the Cost-Benefit Calculation
Metal Toad’s clients at this high-profile comic company were thoughtful in considering the costs that could have been associated with a hack, downtime episode, or poor performance during this critical time.
A few important points to keep in mind:
- The cost to a brand for a security breach during a high visibility consumer-facing event is difficult to quantify, but there are significant reputational risks. Comic-Con is big business, attracting over 135,000 registered fans from all over the world, along with over 2,500 members of the media from more than 30 countries, and generates over $165 million in overall spending just during the event. With this many fans and media all over the website, any outage or performance issue would be news.
- An ounce of prevention in IT security is priceless. No amount of money can rebuild broken trust with consumers, which can lead to long-term problems that can take years to fix. Upguard reports that in 2022, the average cost of a data breach has reached a record high of US $4.35 million, according to the 2022 cost of a data breach report by IBM and the Ponemon institute. This takes into account hundreds of cost factors from legal, regulatory, and technical activities, loss of brand equity, customer turnover, and drain on employee productivity, and is based on 550 breaches across 17 countries and 17 industries with data gathered from over 3,600 interviews.
- Portent reported in 2019 that website conversion rates drop by an average of 4.42% with each additional second of load time (between seconds 0-5). This factor is most likely even more pronounced today as visits shift more and more to mobile devices.
Realizing factors such as these, top companies are doing the math and realizing the cost of more sophisticated AI-based monitoring and threat defense is more than worth it, especially during critical times.
Conclusion: Be the Hero of Your Own Story
Metal Toad was proud to help our client defend their valuable site from the forces of evil by defeating over 10,000 malicious traffic events, through the help of Amazon SageMaker and related server infrastructure. Machine learning from AWS allowed us to find this malicious traffic even among a huge volume of legitimate traffic, ensuring our client had consistent uptime and performance throughout San Diego Comic-Con.
While we do have a superhero team at Metal Toad, in fact these tools are available to anyone who is ready to study and apply them. Great certification programs are available from AWS, and they are straightforward and simple to get started on.
If you’re considering a similar solution for your business, we recommend connecting with AWS, or feel free to visit the Metal Toad website to learn more about our innovative work on AWS. With these powerful tools, creating a safer web is indeed all in a day’s work.
The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this post.
Metal Toad – AWS Partner Spotlight
Metal Toad is an AWS Advanced Tier Services Partner with the Digital Customer Experience Competency that specializes in the Media & Entertainment industry.