Data Lakes and Analytics
The Nasdaq Composite is a stock market index of the common stocks and similar securities listed on the Nasdaq stock market. Tasked with redesigning the architecture of its data warehouse to handle rapidly changing service level demands from customers, Nasdaq teamed up with the AWS Data Lab to explore and test various options for improving scalability and ultimately re-architecting their data warehouse. AWS Data Lab helped the Nasdaq team decide to separate storage from compute by using Amazon Redshift as a compute engine on top of their data lake. Deployment of this new architecture to production created "infinite" capacity for additional data without manual intervention, increased scalability and parallelism, and resulted in a 75% reduction in Reserved Instance costs.
"I wish that we hadn’t waited so late in the project to take advantage of [the AWS Data Lab]. We came out of that week at AWS Data Lab with answers and a clear path to how we were going to solve the problems that we were facing.” Robert Hunt, VP Software Engineering, Nasdaq.
READ CASE STUDY >>
Cenovus Energy is a Canadian-based integrated energy company. Cenovus built a working prototype of a data lake that ingests a variety of datasets from multiple vendors through a streamlined process and an event-driven data pipeline to transform ingested datasets into a common data model. The Cenovus team left their Build Lab with a custom-fit, cloud-based solution that will allow production engineers and other analysts to derive insights into oil well performance faster without having to worry about storage capacity. Cenovus can ingest, process, and analyze billions of rows of Distributed Temperature Sensing (DTS) data, enabling the business to immediately access and generate actionable insights into oil well performance and optimize costs.
“The implementation of our Distributed Temperature Sensing (DTS) data pipeline was accelerated by months using the AWS Data Lab. This shortened our learning curve and enabled access to the data for various high priority use cases.” Don Munroe, Chief Data Officer, Cenovus Energy Inc.
READ CASE STUDY >>
Carbyne is a global leader in mission-critical contact center technologies. Carbyne began working with the AWS Data Lab in a Design Lab to explore options for building a low-latency, multi-tenant, analytical system that would enable them to generate meaningful insights for call center owners who manage 911 calls, such as call duration ranges and peak time-of-day for callers. These data points help Carbyne customers measure the effectiveness of their emergency response systems and then provision staff and resources accordingly. Once the team was ready to start building, they returned for a Build Lab to develop a prototype solution focused on extracting data from their Amazon Aurora data stores, building an ETL process to prepare their data for analytical consumption, and then developing a dashboard with Amazon QuickSight that visualizes metrics around customer call statistics. Carbyne also implemented anonymous embedding of their QuickSight dashboards into their application to deliver those visualizations to customers in a familiar web UI. This prototype lays the foundation for Carbyne to apply this architecture to their broader data pipeline environment and accelerate their launch to production post-lab.
“This experience with the AWS Data Lab is what it means to be in true partnership. Data Lab's support and efforts are much appreciated as we push innovative solutions to the Public Safety Industry. I can say confidently that this Build Lab and Data Lab's support will reduce our time to production by weeks, if not months." Alex Dizengof, Founder & CTO, Carbyne, Inc.
LiUNA – the Laborers’ International Union of North America – is an American and Canadian labor union that manages health and pension funds for over 300 unions. After seeing a surge in demand for fund management services, LiUNA worked with the AWS Data Lab to build an event-driven, serverless data pipeline that modernizes their data ingestion from a manual and time-intensive process to one that automatically scales to accommodate growth. The solution gives LiUNA the flexibility to onboard new health and pension funds quickly and establishes a foundation for future analytics use cases.
"We needed a way to scale an existing process quickly but didn't want to necessarily invest a lot of financial resources to make it happen. Working with the AWS Data Lab allowed our team to get the experience and training they needed, at no cost to LiUNA, to develop a more efficient process that will scale as we grow." Matthew Richard, Chief Information Officer, LiUNA.
READ CASE STUDY>>
Fluent Commerce, headquartered in Australia, is a leading global provider of a SaaS distributed order management system, Fluent Order Management. Fluent supports global brands like L’Oreal and Ted Baker London with inventory availability management and fulfillment optimization. These clients need to handle complex inventory availability at scale. This includes huge peaks at holiday seasons, and the need to show customers accurate inventory availability across several offerings so they can choose the most convenient. To meet this need, Fluent relies on AWS to be able to cost-effectively deliver at scale. Fluent has also partnered with the AWS Data Lab to redesign their data pipelines for better scalability and performance.
“The number one challenge that our retailers face is inventory availability management. With the Data Lab, we were able to get right into testing a hypothesis that we had around how we could make a step change in the processing efficiency of inventory. We were able to do that in one week. Without the combination of experience, subject matter expertise, and technical know-how that the Data Lab provided, we would have really struggled to get anywhere near where we’re at now in that sort of timeframe.” Jamie Cairns, Chief Strategy Officer, Fluent Commerce.
VTEX is an enterprise digital commerce platform where global brands and retailers run their world of commerce. AWS Data Lab helped VTEX modernize its data platform by designing and building a data lake house architecture that supports advanced analytics and focuses on continuous data ingestion and processing. The Build Lab helped VTEX accelerate its journey to production by two months, giving VTEX directors enhanced visibility into key commercial performance indicators like gross merchandise volume (GMV) in an autonomous, continuous, and reliable way.
“AWS Data Lab was crucial to our project, empowering our team to not only build a prototype in a matter of days but to bring a solution to production that we’re confident is built following industry and analytics best practices.” Igor Tavares, Principal Engineer, VTEX.
READ CASE STUDY>>
LEIA O ESTUDO DE CASO>>
Wood Mackenzie, a Verisk business, is a global research and consultancy business powering the energy industry. Wood Mackenzie engaged an AWS Data Lab Resident Architect to review its data and analytics strategy, develop data architecture principles, and enhance its understanding of how to build and scale data architecture solutions for its data platform. As a result, Wood Mackenzie has been able to create a range of scalable, resilient, and cost-efficient implementation patterns, as well as cultivate a culture of modern data architecture literacy. These improvements have resulted in more efficient data pipelines and workloads, leading to high-quality and robust data sets for customers and analysts to support Wood Mackenzie in its journey to transform how they power the planet.
“Our Resident Architect has helped with both tactical items and served as a sounding board as we develop our internal data architecture strategy. Their presence in meetings has been met with comments ranging from ‘I didn’t know AWS had architects that could review our workloads’ to ‘this is really cool.’ Having a Resident Architect has helped us make the most of all of the AWS resources we have and has been a positive feedback cycle into how we build going forward.” Liz Dennett, Ph.D., VP of Data Architecture, Wood Mackenzie.
READ CASE STUDY>>
Hitachi Construction Machinery Co., Ltd in Europe (HCME)
Hitachi Construction Machinery Co. is a Japanese construction equipment company that manufactures a wide range of hydraulic excavators. With the help of the AWS Data Lab, Hitachi Construction Machinery Co., Ltd in Europe (HCME) built a central data hub for their business datasets to streamline data integration and analysis capabilities. The HCME team left their Build Lab with a functioning prototype for centralizing data and visualizing operational insights as well as a roadmap for taking the solution to production, which allowed them to launch to production in only a few weeks post-lab.
"Collaborating with the AWS Data Lab has complimented our digitization strategy, which aligns with our long-term vision. We see huge potential to the data we have and with the help of AWS, we think we can build value creating solutions for our customers and dealers network.” Ryo Kurihara, Manager, Solution Linkage Department, Hitachi Construction Machinery Europe.
READ CASE STUDY>>
KRS.io is a provider of cutting-edge solutions to the convenience and petroleum industry. KRS embarked on a two year plan to fully migrate their data center to AWS and collaborated with the AWS Data Lab to create a secure, fully encrypted data lake solution that ingests and stores data from various source systems in their data center and powers custom visuals and downstream business reports using Amazon QuickSight. Building upon the best practices the KRS team learned in the lab, the data platform prototype they developed, and the ease of embedding QuickSight dashboards into their application and implementing Natural Language Query (NLQ) using Quicksight Q, KRS successfully launched their new analytics product – Epiphany Data Neurocenter – to production post-lab.
“In business, speed matters. Working with AWS Data Lab accelerated our timeframe from proof-of-concept to deployment. I had zero-tolerance for risk and the Data Lab allowed my team to meet my high bar for security and reliability.” Brian McManus, CTO, KRS.io.
HST Pathways is an innovative software technology company enabling healthcare providers to create better patient outcomes through a suite of digital solutions for practice management, electronic charting, and case coordination. HST Pathways wanted a solution that could simplify data aggregation and storage while giving them the flexibility and performance needed to run analytics on large, multi-tenant data sets. The Build Lab created an environment where HST Pathways could design and build a prototype of a near real-time data warehouse, a centralized data lake, and a scalable data streaming service in only four days.
“Working with the AWS Data Lab was intensive and productive. AWS Data Lab Architects helped us work backward from our business needs and try different potential options for our data warehouse project. After the four day Build Lab, we were confident that AWS DMS and Amazon Redshift are the best technologies for our needs, and we were able to quickly launch to production within weeks.” Xiaodong Chen, Manager of Software Engineering, HST Pathways.
READ CASE STUDY >>
Availity is one of the nation’s largest health information networks and facilitates billions of clinical, administrative, and financial transactions annually. During the four-day Build Lab with AWS Data Lab, Availity built an end-to-end data pipeline to process incoming data from its network of healthcare partners. The team also de-coupled their index and search data in near-real time to create APIs that help customers find relevant patient information quickly, while respecting appropriate data governance requirements. Only three months post-lab, Availity was able to move this solution to production and now has an implementation pattern it can replicate for future data needs.
“We left with a functioning prototype that was essential to our business, but the real value of the Data Lab was a team-building effort.” Michael Privat, VP of Digital and Cloud Migration, Availity.
READ CASE STUDY >>
Allen Institute focuses on accelerating foundational research, developing standards and models, and cultivating new ideas to make a broad, transformational impact on science. One of its research institutes, the Allen Institute for Brain Science, collaborated with the AWS Data Lab to rapidly accelerate its journey into data platform modernization. As part of its mission to share massive amounts of data with the public to accelerate advancement in neuroscience, Allen Institute needed to build a solution that could provide researchers around the world with the ability to work with extremely wide datasets - more than 50,000 columns - at scale and with very low latency. In only four days, the Allen Institute team built a working prototype of an end-to-end feature matrix ingestion pipeline using transient Amazon Elastic MapReduce (EMR) clusters and Amazon DynamoDB that dynamically ingests and transforms its wide datasets into consumable, interactive datasets for researchers. The team left the AWS Data Lab with an accelerated plan to bring this solution to production, furthering its commitment to support researchers in the quest for improved health outcomes.
KnowBe4, Inc. provides Security Awareness Training to help companies manage the IT security problems of social engineering, spear phishing, and ransomware attacks. Its training platform revolves around the Risk Score pipeline, which generates an individualized risk score for tens of millions of users daily. KnowBe4 worked with the AWS Data Lab to build a working prototype of a new Risk Score pipeline that reduced total runtime from 7.5 or more hours to 3.5 hours and horizontally scaled every aspect of data retrieval, processing, and training. After the AWS Data Lab, the team used the skills it learned to continue to optimize its pipeline. Five months post-lab, KnowBe4 launched to production with a final runtime of 1.1 hours. In addition to this six-fold reduction in total runtime, KnowBe4's new architecture revealed a four-fold savings in cost.
“What we did in four days would have taken us weeks, maybe months, to achieve some of this refactor of the technical debt we had with our AI pipeline. And at the same time prepare our data handling to scale to 10x what we have today” Marcio Castilho, Chief Architect Officer, KnowBe4.
Sportradar is a global provider of sports data intelligence, serving leagues, news media, consumer platforms, and sports betting operators with deep insights and a suite of strategic solutions to help grow their businesses. It engaged the AWS Data Lab for guidance on developing a modernized, low latency data analytics pipeline and workflow to power real-time statistical models, feature extraction, and inference using machine learning models and real-time dashboards. The Sportradar team left the AWS Data Lab with a clear path forward for real-time sportsbook risk management and real-time fraud detection, as well as a scalable process for deploying and managing additional data pipelines on a global level. It used the AWS Data Lab to help expand the capabilities of its existing cloud-native big data and analytics platform for real-time analytics workloads.
“Using the elasticity and value-added services from AWS, we have managed to analyze a high volume of transactions to produce deep real-time analytics. This gives our traders a crucial edge.” Ben Burdsall, CTO, Sportradar.
Jungle Scout is an all-in-one platform for finding, launching, and selling Amazon products. With the support of the AWS Data Lab, Jungle Scout built the foundation of a data lake in only four days, including a repeatable pattern for building data pipelines that hydrate the data lake from a variety of data systems. By using Amazon S3 as the core of the data lake, Jungle Scout is able to reduce its storage footprint across other databases and remove data silos, ultimately helping the team reduce cost and increase productivity. The solution also makes it simpler to manage multiple versions of product metadata changes, giving Jungle Scout’s data scientists and engineers the flexibility to view data changes several times per day and troubleshoot data faster.
“By leveraging the AWS Data Lab, we were able to launch our analytics solution to production only three months after joining the lab and with only two engineers working full-time on the project. This has resulted in a major shift in how engineers at Jungle Scout build data processing pipelines.” Alex Handley, Principal Architect, Jungle Scout.
READ CASE STUDY >>
Freeman is a leader in brand experience. The Freeman team was tasked with creating a streamlined approach for handling, validating, and joining data that would power visualizations in its custom dashboard service. Freeman partnered with the AWS Data Lab to accelerate the architectural design and prototype build of this solution. In only four days, the Freeman team built a data pipeline prototype for both streaming and batch datasets leveraging Amazon Kinesis and AWS Glue workflows to ingest, curate, and prepare the data. Using Amazon Athena, Amazon Kinesis Data Analytics, and Amazon Elasticsearch Service to query the various curated datasets and Amazon QuickSight and Kibana to visualize the results in easy to consume dashboards, the Freeman team left the AWS Data Lab with a clear path forward for enabling end users to gain valuable insights into its data.
"We were able to leverage our existing knowledge and infrastructure within AWS by expanding into new services and features that we hadn't explored before. With the help of the AWS solutions architects that worked side-by-side with us, we were able to greatly accelerate the delivery of our system and set up a foundation that we can build on down the road.” Casey McMullen, Director of Digital Solution Development, Freeman.
TownSq connects neighbors, board members, and management teams to easy, proven, collaborative tools designed to enhance the community living experience. TownSq needed to upgrade its data and analytics capabilities due to exponential client growth. It decided to build a data lake to enable greater insights about business performance, client benchmarking, engagement levels, and success rates on new products and tools. TownSq also wanted to deploy algorithms to highlight unmet client needs, automate key processes, and provide recommendations to mitigate any emerging or detected risks. In four days, the TownSq team achieved its goal of building a functioning data lake and an extract, transform, load (ETL) pipeline capable of processing data from multiple sources, including Amazon DynamoDB and internal MongoDB and ERP systems. Immediately following the lab, the team was able to use the solution to realign its product roadmap to focus on higher return-on-investment opportunities and dramatically increase engagement on newly-launched features.
"Working directly with Amazon's architects is a major accelerator, especially in a business driven by speed to market. The AWS Data Lab prepped for us, were in the room to support our build, and we walked out days later with a functioning product. The new products we are launching are game-changing and the added knowledge we have will help us continue to lead the market." Luis Lafer-Sousa, President - US, TownSq.
hc1 offers a suite of cloud-based, high-value care solutions that enable healthcare organizations to transform business and clinical data into the intelligence necessary to deliver on the promise of personalized care, all while eliminating waste. As an aggregator of billions of healthcare records from a number of large diagnostic testing providers, hc1 identified the need to migrate from its existing data warehouse to a scalable data lake on AWS to support its advanced analytics initiatives with AWS Artificial Intelligence (AI) and Machine Learning (ML) services. AWS Data Lab helped hc1 migrate its patient diagnostic testing data warehouse to a data lake architecture by partnering to rebuild its core SQL-based ingestion, cleanup, and patient-matching Extract, Transform, Load (ETL) scripts as AWS Glue ETL jobs. The team also leveraged AWS Glue FindMatches to deduplicate patient test panel records across testing providers. hc1's team left the AWS Data Lab with a well-architected data lake framework for its application’s core data repository. The hc1 team also learned best practices for matching patient information across datasets using AWS AI services, which will ensure patient medical record completeness and accuracy by deduplicating data from different points of care.
"Reliable patient record matching is pivotal in improving patient outcomes and reducing clinical waste. AWS AI services allows us to flexibly update our matching system. We are able to incorporate new sources in less than half the time.” Charles Clarke, SVP of Technology, hc1.
Automox is an information technology company providing a cloud-native, zero-maintenance solution that modernizes endpoint management for optimized security and business outcomes. Automox is unique in that it combines individual endpoint management modules into an extensible automation framework that can query endpoints, collect insights, and take action automatically, at scale. Automox collaborated with the AWS Data Lab to build a platform for providing enterprise customers with analytics and insights into endpoint management, patching, and vulnerabilities. Automox leveraged the Data Lab to prototype an end-to-end data pipeline with the goal of enabling an analytics API that can be used without knowledge of the structure in the underlying data stores. This included an ingestion service to load endpoint and patch data from their unified data layer, a data lake for multipurpose storage, and a batch processing layer for aggregations and dynamic querying. This reporting and analytics platform will support both internal users and external customers. The team left the AWS Data Lab with a validated prototype for a data processing pipeline that will support Automox's analytics and query requirements, offering scalability and flexibility as its data footprint continues to grow.
"To address our customers' problems, we need to build fast and make the right technology decisions. AWS Data Lab was the right accelerator for us and gives us a wonderful advantage, being able to validate our assumptions and answer our questions with the right expertise” Pascal Borghino, Head of Engineering, Automox.
Pricewaterhouse Coopers (PwC) is a multinational professional services network of firms, operating as partnerships under the PwC brand. As more organizations use technology and data to modernize and optimize their businesses, there's a need to build solutions using advanced analytics and machine learning (ML) that automate processes, create efficiencies, and ultimately deliver better customer experiences. Recognizing that building an integrated, automated ML system from scratch and operating it in production could be challenging, PwC wanted to create a solution that would simplify this. PwC collaborated with the AWS Data Lab to develop assets that automate the build, deployment, and maintenance of ML models for their customers. As part of the Build Lab, PwC built a model build pipeline, model deployment pipeline, and model monitoring and prediction serving pipeline using Amazon SageMaker that customers can use as a template for their ML use cases, without major code change. Their solution improves prediction quality, reduces time to value for ML research, and allows data scientists to rapidly react to changes in the market.
"If organizations fail to embrace artificial intelligence and machine learning technologies to provide their goods and services, they risk going out of business entirely." Mo Bashir, Managing Director, PwC Australia.
3M is an American enterprise company operating in the fields of industry, worker safety, health care, and consumer goods. 3M R&D needed to enhance its machine learning, analytics, and reporting capabilities for more than 10,000 spreadsheets across six different business operations with more than fifty different schemas. With guidance from the AWS Data Lab, 3M developed a minimum viable product (MVP) for multiple data pipelines, processed with extract, transform, load (ETL), to flow into a data lake in Amazon S3, and then interpret, analyze, and visualize the data using Amazon SageMaker Notebooks and Amazon QuickSight for enhanced insights. This solution will allow 3M to work with customers more interactively, enabling immediate response time and higher customer satisfaction with the entire sales and solutioning process.
“I never knew it was possible to organize so much data in a way that would allow me to effectively access and analyze millions of rows of data, where before I was constantly looking for spreadsheets or just asking for another test to be run.” Lead Materials Application Engineer, 3M.
Drishya AI Labs is an an innovative Industrial AI solutions and deep tech company that uses machine learning and artificial intelligence to help customers optimize their energy operations. Drishya participated in both a Design Lab and a Build Lab to architect the foundation of a multi-tenant data lake, including ETL pipelines and pipelines for building and deploying their machine learning models on AWS. This solution provides the capability to ingest a variety of data points such as high frequency Industrial IoT (IIOT) time series sensor data and work journal data from any Energy customer and derive meaningful recommendations from the data quickly and sustainably. Drishya successfully launched their data platform with batch use cases only three months post-lab and has since seen a rapid progression in terms of efficiency, capacity, and revenue.
"AWS has helped us rapidly build a world-class, scalable, and secure high frequency time series platform, which is a core asset enabling us to provide quality business solutions and deliver customer value.” Saumil Sheth, Chief Operating Officer, Drishya AI Labs Inc.
READ CASE STUDY >>
Civitas Learning is a mission-driven education technology company that relies on the power of machine learning analytics to help higher-education institutions improve student success outcomes. The company collaborated with the AWS Data Lab to design and build an automation notebook architecture and dashboard to track the efficacy of various student success initiatives using machine learning models. The team left the lab with a flexible, repeatable workflow it can use for automating a variety of data science projects in the future. By attending the AWS Data Lab, Civitas’s data science team deepened their knowledge of AWS services and left equipped with the skills needed to quickly adapt this framework at the onset of COVID19 to meet the rapidly changing needs of their community college and university customers.
“AWS assembled a super team to help us architect and integrate key building blocks in machine learning causal inference so that we could construct a real-world evidence knowledge base. They also made sure that we stayed on course after our Data Lab engagement, which is helping us scale our ML practice with much faster deployment speed. It’s been a great, rewarding experience for us all, and our customers are happier as a result.” David Kil, Chief Data Scientist, Civitas Learning.
READ BLOG POST >>
PHD Media is a global communications planning and media buying agency network. PHD Media needed to build a lean, high-performant, and scalable extract, transform, load (ETL) and data storage infrastructure that could support future Machine Learning workloads. The AWS Data Lab helped PHD Media move its ETL jobs to AWS Glue and rebuild its pipeline into a three-part process: data ingestion, data staging, and data summarization. PHD Media left the AWS Data Lab with a new architecture for its data pipeline that reduces ETL processing time from 21 hours to 75 minutes and is capable of integrating with Amazon SageMaker and BI tools.
“We would not have been able to dedicate the same amount of time to the development, nor been able to resolve our questions and problems as quickly without the AWS Data Lab. Doing the same work outside of the AWS Data Lab would have cost us significantly more in funds and time.” Amar Vyas, Global Data Strategy Director, PHD Global Business.
Persefoni is a SaaS Climate Management and Accounting Platform that enables organizations and financial institutions to measure and manage their carbon footprint across their operations and portfolios in a centralized, cloud-based application. To improve Persefoni’s SaaS Platform, their engineering team reached out for guidance and design validation from AWS specialized resources to evaluate and improve the current microservices, API management, and Amazon Aurora MySQL multi-region architecture for the Persefoni Climate Management and Accounting Platform (CMAP).
“AWS Data Lab was instrumental in helping Persefoni quickly and effectively collaborate across AWS engineering teams to validate and enhance our microservices based platform architecture, at a speed and scale required by a fast moving organization. AWS provided focused resources and subject matter experts to explore architecture options, develop working models, and incorporate the availability, reliability, and security needed in the Persefoni Platform. With the AWS Data Lab, we were able to define a fully distributed serverless container architecture using Amazon ECS and AWS Fargate that enables DevSecOps patterns, is scalable and secure by default, and delivers the results our customers expect.” S. Mark Underwood, Director, Cloud Architecture and DevSecOps, Persefoni.
Verisk is a leading global data-driven analytic insights and solutions provider serving the insurance and energy industries. To scale solutions quickly and achieve greater resilience against points of failure, Verisk chose to migrate its legacy database footprint to AWS. Verisk collaborated with AWS Data Lab to receive expert guidance on navigating the design, architectural, and implementation challenges that come with undertaking mass migrations involving complex data types like large objects and geospatial data, large volumes of data, and complex procedures and schemas developed over 20+ years. AWS Data Lab worked with Verisk to architect and prove out a migration path from Verisk's legacy systems to Amazon Aurora PostgreSQL using Amazon Database Migration Service and AWS Schema Conversion Tool. In addition to the technical work achieved in the AWS Data Lab, Verisk came away with an increasingly focused migration strategy, a deepened understanding of how to execute migrations to AWS databases, and best practices for database administration and operating PostgreSQL databases in production.
"As a Database Administrator at Verisk working on the data migration, I am miles ahead of where I was prior to working with the AWS Data Lab. I have more confidence in being able to successfully migrate our legacy database to Aurora PostgreSQL and have a better understanding of what products are available to us. I couldn't have asked for a better experience."
READ CASE STUDY >>
Since 1882, Dow Jones has been finding new ways to bring information to the world’s top business entities. Dow Jones had several Informix databases to migrate to Amazon Aurora PostgreSQL and engaged the AWS Data Lab to help it test different data migration options and establish a well-architected data migration approach to apply to its 100+ databases. In just a week, Dow Jones emerged with a finalized approach for scripting and automating data migration and code deployment, including how to convert stored procedures, triggers, and tables, setting the stage for future Informix migrations.