AWS Big Data Blog
Category: Customer Solutions
How Flutter UKI optimizes data pipelines with AWS Managed Workflows for Apache Airflow
In this post, we share how Flutter UKI transitioned from a monolithic Amazon Elastic Compute Cloud (Amazon EC2)-based Airflow setup to a scalable and optimized Amazon Managed Workflows for Apache Airflow (Amazon MWAA) architecture using features like Kubernetes Pod Operator, continuous integration and delivery (CI/CD) integration, and performance optimization techniques.
How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and Amazon Athena
At the BMW Group, our Cloud Efficiency Analytics (CLEA) team has developed a FinOps solution to optimize costs across over 10,000 cloud accounts This post explores our journey, from the initial challenges to our current architecture, and details the steps we took to achieve a highly efficient, serverless data transformation setup.
Implement Amazon EMR HBase Graceful Scaling
Apache HBase is a massively scalable, distributed big data store in the Apache Hadoop ecosystem. We can use Amazon EMR with HBase on top of Amazon Simple Storage Service (Amazon S3) for random, strictly consistent real-time access for tables with Apache Kylin. This post demonstrates how to gracefully decommission target region servers programmatically.
Amazon Prime Video advances search for sports using Amazon OpenSearch Service
In this post, we will walk you through how Prime Video used Amazon OpenSearch Service and its AI and machine learning (AI/ML) capabilities to build a more intuitive and enhanced sports search experience.
How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions
At Open Universities Australia (OUA), we empower students to explore a vast array of degrees from renowned Australian universities, all delivered through online learning. In this post, we show you how we used AWS services to replace our existing third-party ETL tool, improving the team’s productivity and producing a significant reduction in our ETL operational costs.
How MuleSoft achieved cloud excellence through an event-driven Amazon Redshift lakehouse architecture
In our previous thought leadership blog post Why a Cloud Operating Model we defined a COE Framework and showed why MuleSoft implemented it and the benefits they received from it. In this post, we’ll dive into the technical implementation describing how MuleSoft used Amazon EventBridge, Amazon Redshift, Amazon Redshift Spectrum, Amazon S3, & AWS Glue to implement it.
How EUROGATE established a data mesh architecture using Amazon DataZone
In this post, we show you how EUROGATE uses AWS services, including Amazon DataZone, to make data discoverable by data consumers across different business units so that they can innovate faster. Two use cases illustrate how this can be applied for business intelligence (BI) and data science applications, using AWS services such as Amazon Redshift and Amazon SageMaker.
Juicebox recruits Amazon OpenSearch Service’s vector database for improved talent search
Juicebox is an AI-powered talent sourcing search engine, using advanced natural language models to help recruiters identify the best candidates from a vast dataset of over 800 million profiles. At the core of this functionality is Amazon OpenSearch Service, which provides the backbone for Juicebox’s powerful search infrastructure, enabling a seamless combination of traditional full-text search methods with modern, cutting-edge semantic search capabilities. In this post, we share how Juicebox uses OpenSearch Service for improved search.
Fitch Group achieves multi-Region resiliency for mission-critical Kafka infrastructure with Amazon MSK Replicator
In this post, we explore how Fitch Group, one of the top credit rating companies, used Amazon MSK and Amazon MSK Replicator to achieve multi-Region resiliency for their mission-critical Kafka infrastructure.
Jumia builds a next-generation data platform with metadata-driven specification frameworks
Jumia is a technology company born in 2012, present in 14 African countries, with its main headquarters in Lagos, Nigeria. In this post, we share part of the journey that Jumia took with AWS Professional Services to modernize its data platform that ran under a Hadoop distribution to AWS serverless based solutions.