AWS Big Data Blog
Category: Customer Solutions
How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions
At Open Universities Australia (OUA), we empower students to explore a vast array of degrees from renowned Australian universities, all delivered through online learning. In this post, we show you how we used AWS services to replace our existing third-party ETL tool, improving the team’s productivity and producing a significant reduction in our ETL operational costs.
How MuleSoft achieved cloud excellence through an event-driven Amazon Redshift lakehouse architecture
In our previous thought leadership blog post Why a Cloud Operating Model we defined a COE Framework and showed why MuleSoft implemented it and the benefits they received from it. In this post, we’ll dive into the technical implementation describing how MuleSoft used Amazon EventBridge, Amazon Redshift, Amazon Redshift Spectrum, Amazon S3, & AWS Glue to implement it.
How EUROGATE established a data mesh architecture using Amazon DataZone
In this post, we show you how EUROGATE uses AWS services, including Amazon DataZone, to make data discoverable by data consumers across different business units so that they can innovate faster. Two use cases illustrate how this can be applied for business intelligence (BI) and data science applications, using AWS services such as Amazon Redshift and Amazon SageMaker.
Juicebox recruits Amazon OpenSearch Service’s vector database for improved talent search
Juicebox is an AI-powered talent sourcing search engine, using advanced natural language models to help recruiters identify the best candidates from a vast dataset of over 800 million profiles. At the core of this functionality is Amazon OpenSearch Service, which provides the backbone for Juicebox’s powerful search infrastructure, enabling a seamless combination of traditional full-text search methods with modern, cutting-edge semantic search capabilities. In this post, we share how Juicebox uses OpenSearch Service for improved search.
Fitch Group achieves multi-Region resiliency for mission-critical Kafka infrastructure with Amazon MSK Replicator
In this post, we explore how Fitch Group, one of the top credit rating companies, used Amazon MSK and Amazon MSK Replicator to achieve multi-Region resiliency for their mission-critical Kafka infrastructure.
Jumia builds a next-generation data platform with metadata-driven specification frameworks
Jumia is a technology company born in 2012, present in 14 African countries, with its main headquarters in Lagos, Nigeria. In this post, we share part of the journey that Jumia took with AWS Professional Services to modernize its data platform that ran under a Hadoop distribution to AWS serverless based solutions.
HEMA accelerates their data governance journey with Amazon DataZone
HEMA is a household Dutch retail brand name since 1926, providing daily convenience products using unique design. This post describes how HEMA used Amazon DataZone to build their data mesh and enable streamlined data access across multiple business areas. It explains HEMA’s unique journey of deploying Amazon DataZone, the key challenges they overcame, and the transformative benefits they have realized since deployment in May 2024. From establishing an enterprise-wide data inventory and improving data discoverability, to enabling decentralized data sharing and governance, Amazon DataZone has been a game changer for HEMA.
How DeNA Co., Ltd. accelerated anonymized data quality tests up to 100 times faster using Amazon Redshift Serverless and dbt
DeNA Co., Ltd. (DeNA) engages in a variety of businesses, from games and live communities to sports & the community and healthcare & medical, under our mission to delight people beyond their wildest dreams. This post introduces a case study where DeNA combined Amazon Redshift Serverless and dbt (dbt Core) to accelerate data quality tests in their business.
Accelerate Amazon Redshift secure data use with Satori – Part 2
In this post, we continue from Accelerate Amazon Redshift secure data use with Satori – Part 1, and explain how Satori, an Amazon Redshift Ready partner, simplifies both the user experience of gaining access to data and the admin practice of granting and revoking access to data in Amazon Redshift. Satori enables both just-in-time and self-service access to data.
How REA Group approaches Amazon MSK cluster capacity planning
REA Group, a digital real estate business, uses Amazon Managed Streaming for Apache Kafka (Amazon MSK) and a data streaming platform called Hydro to efficiently share and access large amounts of data across multiple domains and services. This approach allows REA Group to maintain optimal performance and cost-efficiency while scaling to meet growing user demands. In this post, they share their approach to MSK cluster capacity planning.