Customer Stories / Telecommunications
AT&T Cybersecurity Helps Businesses Improve Threat Detection and Response Using Machine Learning with Amazon SageMaker
Learn how AT&T Cybersecurity optimized its use of machine learning to remediate cyber threats using Amazon SageMaker.
A division of AT&T Business, AT&T Cybersecurity offers a broad portfolio of security services, including managed network security, managed security operations, and cybersecurity consulting. Under managed security operations, AT&T delivers managed detection and response using its proprietary USM Anywhere platform, a global team of SOC analysts, and information from its threat intelligence unit AT&T Alien Labs to collect, correlate, and analyze log data from customers’ hybrid environments. The USM Anywhere platform uses machine learning (ML) to speed threat identification and remediation.
To improve threat detection capabilities within USM Anywhere and deliver a better customer experience through enriched, contextual security notifications and fewer false positives, the AT&T Cybersecurity team developed an innovative ML solution using Amazon SageMaker from Amazon Web Services (AWS). Amazon SageMaker helps to build, train, and deploy ML models for virtually any use case. AT&T Cybersecurity benefited from a scalable solution that is managed by AWS and gives its customers fast and frequent training for personalized ML models and cost-effective model hosting for inference using Amazon SageMaker multi-model endpoints.
Opportunity | Using Amazon SageMaker to Improve Threat Detection Capabilities
Using Amazon SageMaker, AT&T Cybersecurity improved threat detection within USM Anywhere, boosted the quality of alert tickets for its customers, and increased the productivity of its own data scientists by 50–60 percent. ML uses massive amounts of data to improve the relevance of threat detection, automating the identification of anomalies in computer traffic that could indicate the presence of an unauthorized user. ML models can detect patterns in the data that aren’t easily captured by rules, extracting user-specific patterns to generate customized alerts that notify relevant parties within seconds. “We wanted to use ML to help improve the quality of the alerts that are being generated,” says Antoine Diffloth, director of data insights at AT&T. “We want our customers to get more benefit out of each alert and spend less time in closing the irrelevant ones.”
In the fall of 2021, AT&T Cybersecurity was looking at significant timeline to optimize ML in its existing threat detection platform built on Amazon Elastic Compute Cloud (Amazon EC2), which offers secure and resizable compute capacity for virtually any workload. The team wanted a scalable solution that could handle its training and inference—the process of using a trained model to make predictions from live data—so that the team wouldn’t have to worry about maintaining its infrastructure. In January 2022, AT&T Cybersecurity rewrote the training and inference microservices using Amazon SageMaker. The team developed a proof of concept in just 3 months, with an ambitious goal of deploying 10 models by the end of 2022. “AWS provided a better way,” says Matthew Schneid, chief architect at AT&T Cybersecurity, Unified Security Management. “We removed a lot of the cost and concern about managing infrastructure and the associated services. We can now focus on our engineering effort.”
The use of Amazon SageMaker makes life easier. That translates into faster iteration and faster exploration of ideas, which gets us down the path of a production model a lot faster.”
Director of Data Insights, AT&T
Solution | Tailoring Cybersecurity Models to Every Client
The solution ingests data from Amazon Relational Database Service (Amazon RDS), a collection of managed services that makes it simple to set up, operate, and scale databases in the cloud. The data is extracted as part of an automated process through AWS Glue, a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, ML, and application development. The AWS Glue job extracts data and sends it to Amazon Simple Storage Service (Amazon S3), an object storage service offering scalability, data availability, security, and performance. Because Amazon S3 seamlessly integrates with Amazon SageMaker, AT&T Cybersecurity doesn’t have to write custom code to load data for training jobs. “AWS is one of the leaders in this space, and for good reason,” Diffloth says. “AWS services work together well, which relieves us of the burden of having to build a lot of that plumbing and piping.”
The use of Amazon SageMaker alleviates management and monitoring complexities by automating the creation of hyperpersonalized models for each client’s business requirements. Data scientists train ML models incrementally based on a customer’s data, beginning with a small dataset and iterating up to 3 months of data until the model training results stabilize. “Models are then trained every 2 weeks to keep up with business needs, a level of customization that would have been nearly impossible to fine-tune manually,” Schneid says. For example, a model must be retrained in conjunction with a new office in a different country so that employee logins don’t initiate false positives.
AT&T data scientists have the flexibility to spin up compute instances independently, such as memory-optimized or compute-optimized processors or GPUs. “The use of Amazon SageMaker makes life easier,” says Diffloth. “That translates into faster iteration and faster exploration of ideas, which gets us down the path of a production model a lot faster.” In fact, a team of three data scientists at AT&T can now complete eight models over 6 months, cutting delivery timelines in half.
Because threat detection requires near-real-time predictions, the team groups similar models and hosts them together using Amazon SageMaker multi-model endpoints, which facilitate ML engineers’ ability to deploy thousands of models on a single endpoint, improving cost effectiveness. “Amazon SageMaker multi-model endpoints are not only cost effective, but they also give us a nice little performance boost from simplification of how we store our models,” says Schneid. In 1 month, the US East 1 Region processed about 100 million events using Amazon SageMaker multi-model endpoints.
As a messaging hub for all its AWS services, AT&T Cybersecurity uses Amazon EventBridge, a serverless event bus that makes it simpler to build event-driven applications at scale. Data scientists work in Amazon SageMaker Studio, which provides a single web-based visual interface where they can perform all ML development steps. The team uses unsupervised learning on Amazon SageMaker Studio notebooks to collaborate while building models. “You can tell Amazon SageMaker was developed by people who have experience in data science,” says Diffloth.
Amazon SageMaker multi-model endpoints are not only cost effective, but they also give us a nice little performance boost from simplification of how we store our models.”
Chief Architect, AT&T
Outcome | Optimizing Alerts using ML on Amazon SageMaker
AT&T Cybersecurity has acquired enough data from its fleet of hyperpersonalized models to begin identifying patterns among customers. As a result, the team plans to start deploying pretrained models to clusters of similar customers, which would cut customer wait time for a usable model from 30 days to 1 or 2 days. “My team doesn’t have to worry about all the usual questions that go through an engineer’s mind. AWS provides a nice suite of tools,” says Schneid. “Using Amazon SageMaker, we can focus on delivering value.”
About AT&T Cybersecurity
AT&T Cybersecurity is a global managed-security-services provider offering a portfolio of network, security operations, and consulting services aimed at helping to guide and support the current and evolving security needs of organizations.
AWS Services Used
Amazon SageMaker is built on Amazon’s two decades of experience developing real-world ML applications, including product recommendations, personalization, intelligent shopping, robotics, and voice-assisted devices.
Learn more »
Amazon Relational Database Service
Amazon Relational Database Service (Amazon RDS) is a collection of managed services that makes it simple to set up, operate, and scale databases in the cloud.
Amazon Simple Storage Service
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance.
AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.
Learn more »
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.