Data Lakes and Analytics
The Nasdaq Composite is a stock market index of the common stocks and similar securities listed on the Nasdaq stock market. Tasked with redesigning the architecture of its data warehouse to handle rapidly changing service level demands from customers, Nasdaq teamed up with the AWS Data Lab to explore and test various options for improving scalability and ultimately re-architecting their data warehouse. AWS Data Lab helped the Nasdaq team decide to separate storage from compute by using Amazon Redshift as a compute engine on top of their data lake. Deployment of this new architecture to production created "infinite" capacity for additional data without manual intervention, increased scalability and parallelism, and resulted in a 75% reduction in Reserved Instance costs.
"I wish that we hadn’t waited so late in the project to take advantage of [the AWS Data Lab]. We came out of that week at AWS Data Lab with answers and a clear path to how we were going to solve the problems that we were facing.” Robert Hunt, VP Software Engineering, Nasdaq.
READ FULL CASE STUDY >>
Cenovus Energy is a Canadian-based integrated energy company. Cenovus built a working prototype of a data lake that ingests a variety of datasets from multiple vendors through a streamlined process and an event-driven data pipeline to transform ingested datasets into a common data model. The Cenovus team left their Build Lab with a custom-fit, cloud-based solution that will allow production engineers and other analysts to derive insights into oil well performance faster without having to worry about storage capacity. Cenovus can ingest, process, and analyze billions of rows of Distributed Temperature Sensing (DTS) data, enabling the business to immediately access and generate actionable insights into oil well performance and optimize costs.
“The implementation of our Distributed Temperature Sensing (DTS) data pipeline was accelerated by months using the AWS Data Lab. This shortened our learning curve and enabled access to the data for various high priority use cases.” Don Munroe, Chief Data Officer, Cenovus Energy Inc.
READ FULL CASE STUDY >>
Availity is one of the nation’s largest health information networks and facilitates billions of clinical, administrative, and financial transactions annually. During the four-day Build Lab with AWS Data Lab, Availity built an end-to-end data pipeline to process incoming data from its network of healthcare partners. The team also de-coupled their index and search data in near-real time to create APIs that help customers find relevant patient information quickly, while respecting appropriate data governance requirements. Only three months post-lab, Availity was able to move this solution to production and now has an implementation pattern it can replicate for future data needs.
“We left with a functioning protype that was essential to our business, but the real value of the Data Lab was a team-building effort.” Michael Privat, VP of Digital and Cloud Migration, Availity.
READ FULL CASE STUDY >>
Wood Mackenzie, a Verisk business, is a global research and consultancy business powering the energy industry. Wood Mackenzie engaged an AWS Data Lab Resident Architect to review its data and analytics strategy, develop data architecture principles, and enhance its understanding of how to build and scale data architecture solutions for its data platform. As a result, Wood Mackenzie has been able to create a range of scalable, resilient, and cost-efficient implementation patterns, as well as cultivate a culture of modern data architecture literacy. These improvements have resulted in more efficient data pipelines and workloads, leading to high-quality and robust data sets for customers and analysts to support Wood Mackenzie in its journey to transform how they power the planet.
“Our Resident Architect has helped with both tactical items and served as a sounding board as we develop our internal data architecture strategy. Their presence in meetings has been met with comments ranging from ‘I didn’t know AWS had architects that could review our workloads’ to ‘this is really cool.’ Having a Resident Architect has helped us make the most of all of the AWS resources we have and has been a positive feedback cycle into how we build going forward.” Liz Dennett, Ph.D., VP of Data Architecture, Wood Mackenzie.
READ FULL CASE STUDY>>
Allen Institute focuses on accelerating foundational research, developing standards and models, and cultivating new ideas to make a broad, transformational impact on science. One of its research institutes, the Allen Institute for Brain Science, collaborated with the AWS Data Lab to rapidly accelerate its journey into data platform modernization. As part of its mission to share massive amounts of data with the public to accelerate advancement in neuroscience, Allen Institute needed to build a solution that could provide researchers around the world with the ability to work with extremely wide datasets - more than 50,000 columns - at scale and with very low latency. In only four days, the Allen Institute team built a working prototype of an end-to-end feature matrix ingestion pipeline using transient Amazon Elastic MapReduce (EMR) clusters and Amazon DynamoDB that dynamically ingests and transforms its wide datasets into consumable, interactive datasets for researchers. The team left the AWS Data Lab with an accelerated plan to bring this solution to production, furthering its commitment to support researchers in the quest for improved health outcomes.
KnowBe4, Inc. provides Security Awareness Training to help companies manage the IT security problems of social engineering, spear phishing, and ransomware attacks. Its training platform revolves around the Risk Score pipeline, which generates an individualized risk score for tens of millions of users daily. KnowBe4 worked with the AWS Data Lab to build a working prototype of a new Risk Score pipeline that reduced total runtime from 7.5 or more hours to 3.5 hours and horizontally scaled every aspect of data retrieval, processing, and training. After the AWS Data Lab, the team used the skills it learned to continue to optimize its pipeline. Five months post-lab, KnowBe4 launched to production with a final runtime of 1.1 hours. In addition to this six-fold reduction in total runtime, KnowBe4's new architecture revealed a four-fold savings in cost.
“What we did in four days would have taken us weeks, maybe months, to achieve some of this refactor of the technical debt we had with our AI pipeline. And at the same time prepare our data handling to scale to 10x what we have today” Marcio Castilho, Chief Architect Officer, KnowBe4.
Sportradar is a global provider of sports data intelligence, serving leagues, news media, consumer platforms, and sports betting operators with deep insights and a suite of strategic solutions to help grow their businesses. It engaged the AWS Data Lab for guidance on developing a modernized, low latency data analytics pipeline and workflow to power real-time statistical models, feature extraction, and inference using machine learning models and real-time dashboards. The Sportradar team left the AWS Data Lab with a clear path forward for real-time sportsbook risk management and real-time fraud detection, as well as a scalable process for deploying and managing additional data pipelines on a global level. It used the AWS Data Lab to help expand the capabilities of its existing cloud-native big data and analytics platform for real-time analytics workloads.
“Using the elasticity and value-added services from AWS, we have managed to analyze a high volume of transactions to produce deep real-time analytics. This gives our traders a crucial edge.” Ben Burdsall, CTO, Sportradar.
Jungle Scout is an all-in-one platform for finding, launching, and selling Amazon products. With the support of the AWS Data Lab, Jungle Scout built the foundation of a data lake in only four days, including a repeatable pattern for building data pipelines that hydrate the data lake from a variety of data systems. By using Amazon S3 as the core of the data lake, Jungle Scout is able to reduce its storage footprint across other databases and remove data silos, ultimately helping the team reduce cost and increase productivity. The solution also makes it simpler to manage multiple versions of product metadata changes, giving Jungle Scout’s data scientists and engineers the flexibility to view data changes several times per day and troubleshoot data faster.
“By leveraging the AWS Data Lab, we were able to launch our analytics solution to production only three months after joining the lab and with only two engineers working full-time on the project. This has resulted in a major shift in how engineers at Jungle Scout build data processing pipelines.” Alex Handley, Principal Architect, Jungle Scout.
READ FULL CASE STUDY >>
Athenascope enables gamers and content creators to make and share great content with artificial intelligence assists. Athenascope’s multi-vendor analytics solution was quickly outpacing its business intelligence needs, so Athenascope collaborated with AWS Data Lab in a two-day Design Lab session to create a lake house architecture for batch and real-time data processing. The team left the Design Lab with a scalable, all-in-one analytics architecture on AWS that empowers the organization to spend less time on building and manual upkeep, and more time exploring and generating player insights.
“Cutting-edge applied machine learning requires a hefty data solution. With Amazon Athena and Amazon QuickSight, we can easily query across ML-generated game, video, and user data, allowing us to provide deeper insights to consumers and game developers alike.” Rachel Chai, VP of Product, Athenascope.
READ FULL CASE STUDY >>
Freeman is a leader in brand experience. The Freeman team was tasked with creating a streamlined approach for handling, validating, and joining data that would power visualizations in its custom dashboard service. Freeman partnered with the AWS Data Lab to accelerate the architectural design and prototype build of this solution. In only four days, the Freeman team built a data pipeline prototype for both streaming and batch datasets leveraging Amazon Kinesis and AWS Glue workflows to ingest, curate, and prepare the data. Using Amazon Athena, Amazon Kinesis Data Analytics, and Amazon Elasticsearch Service to query the various curated datasets and Amazon QuickSight and Kibana to visualize the results in easy to consume dashboards, the Freeman team left the AWS Data Lab with a clear path forward for enabling end users to gain valuable insights into its data.
"We were able to leverage our existing knowledge and infrastructure within AWS by expanding into new services and features that we hadn't explored before. With the help of the AWS solutions architects that worked side-by-side with us, we were able to greatly accelerate the delivery of our system and set up a foundation that we can build on down the road.” Casey McMullen, Director of Digital Solution Development, Freeman.
TownSq connects neighbors, board members, and management teams to easy, proven, collaborative tools designed to enhance the community living experience. TownSq needed to upgrade its data and analytics capabilities due to exponential client growth. It decided to build a data lake to enable greater insights about business performance, client benchmarking, engagement levels, and success rates on new products and tools. TownSq also wanted to deploy algorithms to highlight unmet client needs, automate key processes, and provide recommendations to mitigate any emerging or detected risks. In four days, the TownSq team achieved its goal of building a functioning data lake and an extract, transform, load (ETL) pipeline capable of processing data from multiple sources, including Amazon DynamoDB and internal MongoDB and ERP systems. Immediately following the lab, the team was able to use the solution to realign its product roadmap to focus on higher return-on-investment opportunities and dramatically increase engagement on newly-launched features.
"Working directly with Amazon's architects is a major accelerator, especially in a business driven by speed to market. The AWS Data Lab prepped for us, were in the room to support our build, and we walked out days later with a functioning product. The new products we are launching are game-changing and the added knowledge we have will help us continue to lead the market." Luis Lafer-Sousa, President - US, TownSq.
hc1 offers a suite of cloud-based, high-value care solutions that enable healthcare organizations to transform business and clinical data into the intelligence necessary to deliver on the promise of personalized care, all while eliminating waste. As an aggregator of billions of healthcare records from a number of large diagnostic testing providers, hc1 identified the need to migrate from its existing data warehouse to a scalable data lake on AWS to support its advanced analytics initiatives with AWS Artificial Intelligence (AI) and Machine Learning (ML) services. AWS Data Lab helped hc1 migrate its patient diagnostic testing data warehouse to a data lake architecture by partnering to rebuild its core SQL-based ingestion, cleanup, and patient-matching Extract, Transform, Load (ETL) scripts as AWS Glue ETL jobs. The team also leveraged AWS Glue FindMatches to deduplicate patient test panel records across testing providers. hc1's team left the AWS Data Lab with a well-architected data lake framework for its application’s core data repository. The hc1 team also learned best practices for matching patient information across datasets using AWS AI services, which will ensure patient medical record completeness and accuracy by deduplicating data from different points of care.
"Reliable patient record matching is pivotal in improving patient outcomes and reducing clinical waste. AWS AI services allows us to flexibly update our matching system. We are able to incorporate new sources in less than half the time.” Charles Clarke, SVP of Technology, hc1.
Automox is an information technology company providing a cloud-native, zero-maintenance solution that modernizes endpoint management for optimized security and business outcomes. Automox is unique in that it combines individual endpoint management modules into an extensible automation framework that can query endpoints, collect insights, and take action automatically, at scale. Automox collaborated with the AWS Data Lab to build a platform for providing enterprise customers with analytics and insights into endpoint management, patching, and vulnerabilities. Automox leveraged the Data Lab to prototype an end-to-end data pipeline with the goal of enabling an analytics API that can be used without knowledge of the structure in the underlying data stores. This included an ingestion service to load endpoint and patch data from their unified data layer, a data lake for multipurpose storage, and a batch processing layer for aggregations and dynamic querying. This reporting and analytics platform will support both internal users and external customers. The team left the AWS Data Lab with a validated prototype for a data processing pipeline that will support Automox's analytics and query requirements, offering scalability and flexibility as its data footprint continues to grow.
"To address our customers' problems, we need to build fast and make the right technology decisions. AWS Data Lab was the right accelerator for us and gives us a wonderful advantage, being able to validate our assumptions and answer our questions with the right expertise” Pascal Borghino, Head of Engineering, Automox.
Verisk is a leading global data-driven analytic insights and solutions provider serving the insurance and energy industries. To scale solutions quickly and achieve greater resilience against points of failure, Verisk chose to migrate its legacy database footprint to AWS. Verisk collaborated with AWS Data Lab to receive expert guidance on navigating the design, architectural, and implementation challenges that come with undertaking mass migrations involving complex data types like large objects and geospatial data, large volumes of data, and complex procedures and schemas developed over 20+ years. AWS Data Lab worked with Verisk to architect and prove out a migration path from Verisk's legacy systems to Amazon Aurora PostgreSQL using Amazon Database Migration Service and AWS Schema Conversion Tool. In addition to the technical work achieved in the AWS Data Lab, Verisk came away with an increasingly focused migration strategy, a deepened understanding of how to execute migrations to AWS databases, and best practices for database administration and operating PostgreSQL databases in production.
"As a Database Administrator at Verisk working on the data migration, I am miles ahead of where I was prior to working with the AWS Data Lab. I have more confidence in being able to successfully migrate our legacy database to Aurora PostgreSQL and have a better understanding of what products are available to us. I couldn't have asked for a better experience."
READ FULL CASE STUDY >>
Since 1882, Dow Jones has been finding new ways to bring information to the world’s top business entities. Dow Jones had several Informix databases to migrate to Amazon Aurora PostgreSQL and engaged the AWS Data Lab to help it test different data migration options and establish a well-architected data migration approach to apply to its 100+ databases. In just a week, Dow Jones emerged with a finalized approach for scripting and automating data migration and code deployment, including how to convert stored procedures, triggers, and tables, setting the stage for future Informix migrations.
3M is an American enterprise company operating in the fields of industry, worker safety, health care, and consumer goods. 3M R&D needed to enhance its machine learning, analytics, and reporting capabilities for more than 10,000 spreadsheets across six different business operations with more than fifty different schemas. With guidance from the AWS Data Lab, 3M developed a minimum viable product (MVP) for multiple data pipelines, processed with extract, transform, load (ETL), to flow into a data lake in Amazon S3, and then interpret, analyze, and visualize the data using Amazon SageMaker Notebooks and Amazon QuickSight for enhanced insights. This solution will allow 3M to work with customers more interactively, enabling immediate response time and higher customer satisfaction with the entire sales and solutioning process.
“I never knew it was possible to organize so much data in a way that would allow me to effectively access and analyze millions of rows of data, where before I was constantly looking for spreadsheets or just asking for another test to be run.” Lead Materials Application Engineer, 3M.
Drishya AI Labs is an an innovative Industrial AI solutions and deep tech company that uses machine learning and artificial intelligence to help customers optimize their energy operations. Drishya participated in both a Design Lab and a Build Lab to architect the foundation of a multi-tenant data lake, including ETL pipelines and pipelines for building and deploying their machine learning models on AWS. This solution provides the capability to ingest a variety of data points such as high frequency Industrial IoT (IIOT) time series sensor data and work journal data from any Energy customer and derive meaningful recommendations from the data quickly and sustainably. Drishya successfully launched their data platform with batch use cases only three months post-lab and has since seen a rapid progression in terms of efficiency, capacity, and revenue.
"AWS has helped us rapidly build a world-class, scalable, and secure high frequency time series platform, which is a core asset enabling us to provide quality business solutions and deliver customer value.” Saumil Sheth, Chief Operating Officer, Drishya AI Labs Inc.
READ FULL CASE STUDY >>
Civitas Learning is a mission-driven education technology company that relies on the power of machine learning analytics to help higher-education institutions improve student success outcomes. The company collaborated with the AWS Data Lab to design and build an automation notebook architecture and dashboard to track the efficacy of various student success initiatives using machine learning models. The team left the lab with a flexible, repeatable workflow it can use for automating a variety of data science projects in the future. By attending the AWS Data Lab, Civitas’s data science team deepened their knowledge of AWS services and left equipped with the skills needed to quickly adapt this framework at the onset of COVID19 to meet the rapidly changing needs of their community college and university customers.
“AWS assembled a super team to help us architect and integrate key building blocks in machine learning causal inference so that we could construct a real-world evidence knowledge base. They also made sure that we stayed on course after our Data Lab engagement, which is helping us scale our ML practice with much faster deployment speed. It’s been a great, rewarding experience for us all, and our customers are happier as a result.” David Kil, Chief Data Scientist, Civitas Learning.
READ BLOG POST >>
PHD Media is a global communications planning and media buying agency network. PHD Media needed to build a lean, high-performant, and scalable extract, transform, load (ETL) and data storage infrastructure that could support future Machine Learning workloads. The AWS Data Lab helped PHD Media move its ETL jobs to AWS Glue and rebuild its pipeline into a three-part process: data ingestion, data staging, and data summarization. PHD Media left the AWS Data Lab with a new architecture for its data pipeline that reduces ETL processing time from 21 hours to 75 minutes and is capable of integrating with Amazon SageMaker and BI tools.
“We would not have been able to dedicate the same amount of time to the development, nor been able to resolve our questions and problems as quickly without the AWS Data Lab. Doing the same work outside of the AWS Data Lab would have cost us significantly more in funds and time.” Amar Vyas, Global Data Strategy Director, PHD Global Business.