Data Lakes and Analytics
The Nasdaq Composite is a stock market index of the common stocks and similar securities listed on the Nasdaq stock market. Demand on Nasdaq's enterprise data warehouse was growing so rapidly that Nasdaq feared its system would soon hit a limit beyond which it could not scale. The Nasdaq team was tasked with redesigning the architecture of its data warehouse to handle rapidly changing service level demands from customers and decided to partner with the AWS Data Lab to accelerate the creation of this solution. Over four days, the Nasdaq team worked with the AWS Data Lab to explore and test various options for improving scalability and decided to separate storage from compute by using Amazon Redshift as a compute engine on top of its data lake. Rather than maintaining a single large Amazon Redshift cluster, the team deployed smaller Amazon Redshift clusters suited to the needs of its different business users. Deployment of this new architecture to production created "infinite" capacity for additional data without manual intervention, increased scalability and parallelism, and resulted in a 75% reduction in Reserved Instance costs.
"When the team came back from the Data Lab, they came back with a clear direction on how we were going to solve our problems and take this solution to production. If you have an opportunity to design a solution which has this kind of infinite capacity, you should take it. Design for infinite.” Robert Hunt, VP Software Engineering, Nasdaq.
Learn more by watching Nasdaq's 2019 re:Invent presentation>>
Allen Institute focuses on accelerating foundational research, developing standards and models, and cultivating new ideas to make a broad, transformational impact on science. One of its research institutes, the Allen Institute for Brain Science, partnered with the AWS Data Lab to rapidly accelerate its journey into data platform modernization. As part of its mission to share massive amounts of data with the public to accelerate advancement in neuroscience, Allen Institute needed to build a solution that could provide researchers around the world with the ability to work with extremely wide datasets - more than 50,000 columns - at scale and with very low latency. In only four days, the Allen Institute team built a working prototype of an end-to-end feature matrix ingestion pipeline using transient Amazon Elastic MapReduce (EMR) clusters and Amazon DynamoDB that dynamically ingests and transforms its wide datasets into consumable, interactive datasets for researchers. The team left the AWS Data Lab with an accelerated plan to bring this solution to production, furthering its commitment to support researchers in the quest for improved health outcomes.
KnowBe4, Inc. provides Security Awareness Training to help companies manage the IT security problems of social engineering, spear phishing, and ransomware attacks. Its training platform revolves around the Risk Score pipeline, which generates an individualized risk score for tens of millions of users daily. KnowBe4 worked with the AWS Data Lab to build a working prototype of a new Risk Score pipeline that reduced total runtime from 7.5 or more hours to 3.5 hours and horizontally scaled every aspect of data retrieval, processing, and training. After the AWS Data Lab, the team used the skills it learned to continue to optimize its pipeline. Five months post-lab, KnowBe4 launched to production with a final runtime of 1.1 hours. In addition to this six-fold reduction in total runtime, KnowBe4's new architecture revealed a four-fold savings in cost.
“What we did in four days would have taken us weeks, maybe months, to achieve some of this refactor of the technical debt we had with our AI pipeline. And at the same time prepare our data handling to scale to 10x what we have today” Marcio Castilho, Chief Architect Officer, KnowBe4.
Sportradar is a global provider of sports data intelligence, serving leagues, news media, consumer platforms, and sports betting operators with deep insights and a suite of strategic solutions to help grow their businesses. It engaged the AWS Data Lab for guidance on developing a modernized, low latency data analytics pipeline and workflow to power real-time statistical models, feature extraction, and inference using machine learning models and real-time dashboards. The Sportradar team left the AWS Data Lab with a clear path forward for real-time sportsbook risk management and real-time fraud detection, as well as a scalable process for deploying and managing additional data pipelines on a global level. It used the AWS Data Lab to help expand the capabilities of its existing cloud-native big data and analytics platform for real-time analytics workloads.
“Using the elasticity and value-added services from AWS, we have managed to analyze a high volume of transactions to produce deep real-time analytics. This gives our traders a crucial edge.” Ben Burdsall, CTO, Sportradar.
Freeman is a leader in brand experience. The Freeman team was tasked with creating a streamlined approach for handling, validating, and joining data that would power visualizations in its custom dashboard service. Freeman partnered with the AWS Data Lab to accelerate the architectural design and prototype build of this solution. In only four days, the Freeman team built a data pipeline prototype for both streaming and batch datasets leveraging Amazon Kinesis and AWS Glue workflows to ingest, curate, and prepare the data. Using Amazon Athena, Amazon Kinesis Data Analytics, and Amazon Elasticsearch Service to query the various curated datasets and Amazon QuickSight and Kibana to visualize the results in easy to consume dashboards, the Freeman team left the AWS Data Lab with a clear path forward for enabling end users to gain valuable insights into its data.
"We were able to leverage our existing knowledge and infrastructure within AWS by expanding into new services and features that we hadn't explored before. With the help of the AWS solutions architects that worked side-by-side with us, we were able to greatly accelerate the delivery of our system and set up a foundation that we can build on down the road.” Casey McMullen, Director of Digital Solution Development, Freeman.
TownSq connects neighbors, board members, and management teams to easy, proven, collaborative tools designed to enhance the community living experience. TownSq needed to upgrade its data and analytics capabilities due to exponential client growth. It decided to build a data lake to enable greater insights about business performance, client benchmarking, engagement levels, and success rates on new products and tools. TownSq also wanted to deploy algorithms to highlight unmet client needs, automate key processes, and provide recommendations to mitigate any emerging or detected risks. In four days, the TownSq team achieved its goal of building a functioning data lake and an extract, transform, load (ETL) pipeline capable of processing data from multiple sources, including Amazon DynamoDB and internal MongoDB and ERP systems. Immediately following the lab, the team was able to use the solution to realign its product roadmap to focus on higher return-on-investment opportunities and dramatically increase engagement on newly-launched features.
"Working directly with Amazon's architects is a major accelerator, esepcially in a business driven by speed to market. The AWS Data Lab prepped for us, were in the room to support our build, and we walked out days later with a functioning product. The new products we are launching are game-changing and the added knowledge we have will help us continue to lead the market." Luis Lafer-Sousa, President - US, TownSq.
hc1 offers a suite of cloud-based, high-value care solutions that enable healthcare organizations to transform business and clinical data into the intelligence necessary to deliver on the promise of personalized care, all while eliminating waste. As an aggregator of billions of healthcare records from a number of large diagnostic testing providers, hc1 identified the need to migrate from its existing data warehouse to a scalable data lake on AWS to support its advanced analytics initiatives with AWS Artificial Intelligence (AI) and Machine Learning (ML) services. AWS Data Lab helped hc1 migrate its patient diagnostic testing data warehouse to a data lake architecture by partnering to rebuild its core SQL-based ingestion, cleanup, and patient-matching Extract, Transform, Load (ETL) scripts as AWS Glue ETL jobs. The team also leveraged AWS Glue FindMatches to deduplicate patient test panel records across testing providers. hc1's team left the AWS Data Lab with a well-architected data lake framework for its application’s core data repository. The hc1 team also learned best practices for matching patient information across datasets using AWS AI services, which will ensure patient medical record completeness and accuracy by deduplicating data from different points of care.
"Reliable patient record matching is pivotal in improving patient outcomes and reducing clinical waste. AWS AI services allows us to flexibly update our matching system. We are able to incorporate new sources in less than half the time.” Charles Clarke, SVP of Technology, hc1.
Since 1882, Dow Jones has been finding new ways to bring information to the world’s top business entities. Dow Jones had several Informix databases to migrate to Amazon Aurora PostgreSQL and engaged the AWS Data Lab to help it test different data migration options and establish a well-architected data migration approach to apply to its 100+ databases. In just a week, Dow Jones emerged with a finalized approach for scripting and automating data migration and code deployment, including how to convert stored procedures, triggers, and tables, setting the stage for future Informix migrations.
3M is an American enterprise company operating in the fields of industry, worker safety, health care, and consumer goods. 3M R&D needed to enhance its machine learning, analytics, and reporting capabilities for more than 10,000 spreadsheets across six different business operations with more than fifty different schemas. With guidance from the AWS Data Lab, 3M developed a minimum viable product (MVP) for multiple data pipelines, processed with extract, transform, load (ETL), to flow into a data lake in Amazon S3, and then interpret, analyze, and visualize the data using Amazon SageMaker Notebooks and Amazon QuickSight for enhanced insights. This solution will allow 3M to work with customers more interactively, enabling immediate response time and higher customer satisfaction with the entire sales and solutioning process.
“I never knew it was possible to organize so much data in a way that would allow me to effectively access and analyze millions of rows of data, where before I was constantly looking for spreadsheets or just asking for another test to be run.” Lead Materials Application Engineer, 3M.
Civitas Learning is a data science company dedicated to helping higher education solve pressing challenges and improve student success outcomes. The company partnered with the AWS Data Lab to architect and integrate key building blocks in machine learning (ML) causal inference in order to create a real-world evidence knowledge base. Civitas Learning implemented an architecture for using notebooks in a production environment and left the AWS Data Lab with a new, repeatable workflow it can use for additional data science tasks.
“AWS assembled a super team to help us architect and integrate key building blocks in ML causal inference so that we could construct real-world evidence knowledge base. They also made sure that we stayed on course after our Data Lab engagement, which is helping us scale our ML practice with much faster deployment speed. It’s been a great, rewarding experience for us all, and our customers are happier as a result.” David Kil, Chief Data Scientist, Civitas Learning.
PHD Media is a global communications planning and media buying agency network. PHD Media needed to build a lean, high-performant, and scalable extract, transform, load (ETL) and data storage infrastructure that could support future Machine Learning workloads. The AWS Data Lab helped PHD Media move its ETL jobs to AWS Glue and rebuild its pipeline into a three-part process: data ingestion, data staging, and data summarization. PHD Media left the AWS Data Lab with a new architecture for its data pipeline that reduces ETL processing time from 21 hours to 75 minutes and is capable of integrating with Amazon SageMaker and BI tools.
“We would not have been able to dedicate the same amount of time to the development, nor been able to resolve our questions and problems as quickly without the AWS Data Lab. Doing the same work outside of the AWS Data Lab would have cost us significantly more in funds and time.” Amar Vyas, Global Data Strategy Director, PHD Global Business.