AWS Big Data Blog

Best practices for migrating from Apache Airflow 2.x to Apache Airflow 3.x on Amazon MWAA

Apache Airflow 3.x on Amazon MWAA introduces architectural improvements such as API-based task execution that provides enhanced security and isolation. This migration presents an opportunity to embrace next-generation workflow orchestration capabilities while providing business continuity. This post provides best practices and a streamlined approach to successfully navigate this critical migration, providing minimal disruption to your mission-critical data pipelines while maximizing the enhanced capabilities of Airflow 3.

Bridging data silos: cross-bounded context querying with Vanguard’s Operational Read-only Data Store (ORDS) using Amazon Redshift

At Vanguard, we faced significant challenges with our legacy mainframe system that limited our ability to deliver modern, personalized customer experiences. Our centralized database architecture created performance bottlenecks and made it difficult to scale services independently for our millions of personal and institutional investors. In this post, we show you how we modernized our data architecture using Amazon Redshift as our Operational Read-only Data Store (ORDS).

Seamlessly Integrate Data on Google BigQuery and ClickHouse Cloud with AWS Glue

Migrating from Google Cloud’s BigQuery to ClickHouse Cloud on AWS allows businesses to leverage the speed and efficiency of ClickHouse for real-time analytics while benefiting from AWS’s scalable and secure environment. This article provides a comprehensive guide to executing a direct data migration using AWS Glue ETL, highlighting the advantages and best practices for a […]

Optimize efficiency with language analyzers using scalable multilingual search in Amazon OpenSearch Service

Organizations manage content across multiple languages as they expand globally. Ecommerce platforms, customer support systems, and knowledge bases require efficient multilingual search capabilities to serve diverse user bases effectively. This unified search approach helps multinational organizations maintain centralized content repositories while making sure users, regardless of their preferred language, can effectively find and access relevant […]

How Laravel Nightwatch handles billions of observability events in real time with Amazon MSK and ClickHouse Cloud

Laravel, one of the world’s most popular web frameworks, launched its first-party observability platform, Laravel Nightwatch, to provide developers with real-time insights into application performance. Built entirely on AWS managed services and ClickHouse Cloud, the service already processes over one billion events per day while maintaining sub-second query latency, giving developers instant visibility into the health of their applications.

Introducing Apache Airflow 3 on Amazon MWAA: New features and capabilities

AWS announced the general availability of Apache Airflow 3 on Amazon Managed Workflows for Apache Airflow (Amazon MWAA). This release transforms how organizations use Apache Airflow to orchestrate data pipelines and business processes in the cloud, bringing enhanced security, improved performance, and modern workflow orchestration capabilities to Amazon MWAA customers. This post explores the features of Airflow 3 on Amazon MWAA and outlines enhancements that improve your workflow orchestration capabilities

Search++, Going Beyond Keywords with Amazon OpenSearch Service

Search technology, specifically web search technology, has been around for more than 30 years. You entered a few words in a text box, clicked “Search,” and received a series of links. However, the results were often a mix of related, non-related, and general links. If the results didn’t contain the information you needed, you reformulated […]

Scaling cluster manager and admin APIs in Amazon OpenSearch Service

In this post, we demonstrate the different bottlenecks that were identified and the corresponding solutions that were implemented in OpenSearch Service to scale cluster manager for large cluster deployments. These optimizations are available to all new domains or existing domains upgraded to OpenSearch Service versions 2.17 or above.

Optimize Amazon EMR runtime for Apache Spark with EMR S3A

With the Amazon EMR 7.10 runtime, Amazon EMR has introduced EMR S3A, an improved implementation of the open source S3A file system connector. In this post, we showcase the enhanced read and write performance advantages of using Amazon EMR 7.10.0 runtime for Apache Spark with EMR S3A as compared to EMRFS and the open source S3A file system connector.