Data Lakes and Analytics
KnowBe4, Inc. provides Security Awareness Training to help companies manage the IT security problems of social engineering, spear phishing, and ransomware attacks. Its training platform revolves around the Risk Score pipeline, which generates an individualized risk score for tens of millions of users daily. KnowBe4 worked with the AWS Data Lab to build a working prototype of a new Risk Score pipeline that reduced total runtime from 7.5 or more hours to 3.5 hours and horizontally scaled every aspect of data retrieval, processing, and training. After the AWS Data Lab, the team used the skills it learned to continue to optimize its pipeline. Five months post-lab, KnowBe4 launched to production with a final runtime of 1.1 hours. In addition to this six-fold reduction in total runtime, KnowBe4's new architecture revealed a four-fold savings in cost.
“What we did in four days would have taken us weeks, maybe months, to achieve some of this refactor of the technical debt we had with our AI pipeline. And at the same time prepare our data handling to scale to 10x what we have today” Marcio Castilho, Chief Architect Officer, KnowBe4.
Sportradar is a global provider of sports data intelligence, serving leagues, news media, consumer platforms, and sports betting operators with deep insights and a suite of strategic solutions to help grow their businesses. It engaged the AWS Data Lab for guidance on developing a modernized, low latency data analytics pipeline and workflow to power real-time statistical models, feature extraction and inference using machine learning models, and real-time dashboards. The Sportradar team left the AWS Data Lab with a clear path forward for real-time sportsbook risk management and real-time fraud detection, as well as a scalable process for deploying and managing additional data pipelines on a global level. It used the AWS Data Lab to help expand the capabilities of its existing cloud-native big data and analytics platform for real-time analytics workloads.
“Using the elasticity and value-added services from AWS, we have managed to analyze a high volume of transactions to produce deep real-time analytics. This gives our traders a crucial edge.” Ben Burdsall, CTO, Sportradar.
Freeman is a leader in brand experience. Freeman leveraged the AWS Data Lab to architect an ingestion pipeline that joins three streaming and batch data sources using Amazon Kinesis and AWS Glue workflows. Freeman gets visual insights from its data by enabling its various data sources to be queried for Amazon QuickSight and Kibana dashboards using Amazon Kinesis Data Analytics and Amazon Elasticsearch Service.
"We were able to leverage our existing knowledge and infrastructure within AWS by expanding into new services and features that we hadn't explored before. With the help of the AWS solutions architects that worked side-by-side with us, we were able to greatly accelerate the delivery of our system and set up a foundation that we can build on down the road.” Casey McMullen, Director of Digital Solution Development, Freeman.
Since 1882, Dow Jones has been finding new ways to bring information to the world’s top business entities. Dow Jones had several Informix databases to migrate to Amazon Aurora PostgreSQL and engaged the AWS Data Lab to help it test different data migration options and establish a well-architected data migration approach to apply to its 100+ databases. In just a week, Dow Jones emerged with a finalized approach for scripting and automating data migration and code deployment, including how to convert stored procedures, triggers, and tables, setting the stage for future Informix migrations.
3M is an American enterprise company operating in the fields of industry, worker safety, health care, and consumer goods. 3M R&D needed to enhance its machine learning, analytics, and reporting capabilities for more than 10,000 spreadsheets across six different business operations with more than fifty different schemas. With guidance from the AWS Data Lab, 3M developed a minimum viable product (MVP) for multiple data pipelines, processed with extract, transform, load (ETL), to flow into a data lake in Amazon S3, and then interpret, analyze, and visualize the data using Amazon SageMaker Notebooks and Amazon QuickSight for enhanced insights. This solution will allow 3M to work with customers more interactively, enabling immediate response time and higher customer satisfaction with the entire sales and solutioning process.
“I never knew it was possible to organize so much data in a way that would allow me to effectively access and analyze millions of rows of data, where before I was constantly looking for spreadsheets or just asking for another test to be run.” Lead Materials Application Engineer, 3M.
Civitas Learning is a data science company dedicated to helping higher education solve pressing challenges and improve student success outcomes. The company partnered with the AWS Data Lab to architect and integrate key building blocks in machine learning (ML) causal inference in order to create a real-world evidence knowledge base. Civitas Learning implemented an architecture for using notebooks in a production environment and left the AWS Data Lab with a new, repeatable workflow it can use for additional data science tasks.
“AWS assembled a super team to help us architect and integrate key building blocks in ML causal inference so that we could construct real-world evidence knowledge base. They also made sure that we stayed on course after our Data Lab engagement, which is helping us scale our ML practice with much faster deployment speed. It’s been a great, rewarding experience for us all, and our customers are happier as a result.” David Kil, Chief Data Scientist, Civitas Learning.