Customer Stories / Professional Services / United States
Reducing Processing Time for ML Workflow Using AWS Step Functions Distributed Map with CyberGRX
Learn how CyberGRX in cybersecurity saved time and reduced compute costs on ML modeling using AWS Step Functions Distributed Map.
90%
reduction in compute costs
Scales
to 10,000 concurrent processes
1 hour
runtime compared to 8 days
Improved
staff productivity
Saves
time for engineers
Overview
CyberGRX, a third-party risk management provider, wanted to improve its machine learning (ML) modeling, which processes data from hundreds of thousands of companies on a quarterly basis. Modeling the predictive analysis using its legacy solution took up to 8 days and required four engineers to oversee the process. The company also wanted to reduce hardware costs that its legacy solution required. CyberGRX needed to meet these needs while the company was growing, adding more companies and more data to its modeling.
To solve its challenges, CyberGRX chose to use Amazon Web Services (AWS) to progressively improve its ML modeling. CyberGRX reduced the runtime of its solution from 8 days to 1 hour for its quarterly modeling, improved staff productivity, and reduced its hardware costs by 90 percent.
Opportunity | Using AWS Step Functions to Power Predictive ML Modeling for CyberGRX
CyberGRX started by offering risk assessments as a service to large companies, gathering assessment data from third-party vendors, which was slow and workforce intensive. To improve upon this process, the company adopted a predictive ML modeling system to ingest threat data and apply it to various assessments.
The legacy ML modeling solution for CyberGRX took 8 days to complete predictive modeling for all the companies in its database—150,000 at the time. This process would occur every quarter, with smaller predictive modeling jobs happening as new companies were added. Four engineers were required to monitor and manage it, which added costs to an already compute-intensive solution. As the company grew, it needed to scale its solution to handle more data more quickly.
CyberGRX chose to use AWS Step Functions—a visual workflow service for developers to use AWS services to build distributed applications, automate processes, orchestrate microservices, and create data and ML pipelines—to improve its ML solution. “Using AWS Step Functions fit our needs and was not difficult to get running,” says Charles Burton, staff software engineer at CyberGRX.
When building the first iteration of its new solution, CyberGRX used the support of the AWS team. “We worked with the AWS team when we ran into challenges with scalability and API limits,” says Burton. “The AWS team was instrumental to us as we aimed to understand why something went wrong and how to fix it.” Development time for this first iteration of its solution took around 6 weeks.
Using AWS has been a great learning journey, a great experience, and a great accelerator to our business."
Charles Burton
Staff Software Engineer, CyberGRX
Solution | Reducing Compute Costs by 90% Using AWS Step Functions
In 2022, AWS launched AWS Step Functions Distributed Map, a high-concurrency mode for the map state capable of running 10,000 parallel workflow processes. CyberGRX immediately began using this new feature of AWS Step Functions to improve upon the first iteration of its solution. “AWS Step Functions Distributed Map was the exact solution that we needed to make our solution fly,” says Burton. “It has the ability to understand the size of data that you’re putting into it, and then you can make runtime decisions based on that.” With the new solution, the company can place its data in Amazon Simple Storage Service (Amazon S3)—object storage built to retrieve virtually any amount of data from anywhere—and AWS Step Functions Distributed Map parses the data and distributes it correctly. CyberGRX can make scalability decisions at runtime to balance its horizontal scalability, a further optimization of this solution.
The company can now scale its use of AWS Lambda—a serverless, event-driven compute service used to run code for virtually any type of application or backend service without provisioning or managing servers—in the background for 10,000 concurrent processes. In the future, the company will be able to take advantage of the ability to split its data and run two distributed maps simultaneously, using the 20,000 parallel compute capacity of AWS Lambda.
The scalability of its solution helps support the business’s growth. CyberGRX’s legacy solution prior to using AWS Step Functions took 8 days for predictive ML modeling of data from 150,000 businesses. Using AWS Step Functions Distributed Map, the company can run ML modeling for 260,000 businesses in 1 hour. This amounts to 75 billion data calculations that are condensed into smaller output for the company’s customers.
This improvement also helped the company reduce the number of engineers working with the solution from four down to one. “Using AWS Step Functions, I can free up my and my team’s time to accelerate and move forward in providing tangible value to the company and our customers,” says Burton. CyberGRX engineers can now work on optimizing data feeds and iterate more quickly on new model versions. This means the company can develop innovative technologies and think about how to scale while developing before taking products to the marketplace.
CyberGRX saves 90 percent on compute costs by using AWS Step Functions Distributed Map. These savings apply to the full model runs that the company does every quarter and to adding new business data as CyberGRX continues to grow. The company is putting automation into place so that the only costs it pays are the costs for the system and the occasional debugging, removing the need for engineer oversight when the solution is running. Additionally, the company saved costs running its solution by freeing up engineering resources previously spent monitoring it for hours at a time.
Outcome | Improving Customer Experience Using AWS
As CyberGRX continues to iterate on and improve its solution using AWS, it provides a better experience for its customers. The company can provide data more quickly for customer use and plans to fully automate the process in the future with reduced delay. CyberGRX has taken risk assessment turnaround time for customers from 1 year or more to within 1 month, and it wants to use AWS to further reduce that to less than 1 week.
“Using AWS, we have gone from a solution that was slow and expensive to one that’s lean, nimble, and incredibly fast,” says Burton. “Using AWS has been a great learning journey, a great experience, and a great accelerator to our business.”
About CyberGRX
CyberGRX, based in Denver, Colorado, provides third-party risk management, including risk insights, threat intelligence, and scenario modeling. CyberGRX uses machine learning to provide predictive risk data and processes data for over 260,000 companies.
AWS Services Used
AWS Step Functions
AWS Step Functions is a visual workflow service that helps developers use AWS services to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning (ML) pipelines.
AWS Lambda
AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers.
Learn more »
Amazon S3
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance.
Learn more »
More Professional Services Customer Stories
Get Started
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.