AWS Big Data Blog

Category: Amazon Managed Workflows for Apache Airflow (Amazon MWAA)

What’s new with Amazon MWAA support for Apache Airflow version 2.4.3

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that makes it simple to set up and operate end-to-end data pipelines in the cloud at scale. Amazon MWAA supports multiple versions of Apache Airflow (v1.10.12, v2.0.2, and v2.2.2). Earlier in 2023, we added support for Apache Airflow v2.4.3 […]

Improve observability across Amazon MWAA tasks

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that makes it simple to set up and operate end-to-end data pipelines in the cloud at scale. A data pipeline is a set of tasks and processes used to automate the movement and transformation of data between different systems.­ […]

Automate data lineage on Amazon MWAA with OpenLineage

In modern data architectures, datasets are combined across an organization using a variety of purpose-built services to unlock insights. As a result, data governance becomes a key component for data consumers and producers to know that their data-driven decisions are based on trusted and accurate datasets. One aspect of data governance is data lineage, which […]

How ZS created a multi-tenant self-service data orchestration platform using Amazon MWAA

This is post is co-authored by Manish Mehra, Anirudh Vohra, Sidrah Sayyad, and Abhishek I S (from ZS), and Parnab Basak (from AWS). The team at ZS collaborated closely with AWS to build a modern, cloud-native data orchestration platform. ZS is a management consulting and technology firm focused on transforming global healthcare and beyond. We […]

How GE Proficy Manufacturing Data Cloud replatformed to improve TCO, data SLA, and performance

This is post is co-authored by Jyothin Madari, Madhusudhan Muppagowni and Ayush Srivastava from GE. GE Proficy Manufacturing Data Cloud (MDC), part of the GE Digital’s Manufacturing Execution Systems (MES) suite of solutions, allows GED’s customers to increase the derived value easily and quickly from the MES by reliably bringing enterprise-wide manufacturing data into the […]

Persist and analyze metadata in a transient Amazon MWAA environment

Customers can harness sophisticated orchestration capabilities through the open-source tool Apache Airflow. Airflow can be installed on Amazon EC2 instances or can be dockerized and deployed as a container on AWS container services. Alternatively, customers can also opt to leverage Amazon Managed Workflows for Apache Airflow (MWAA). Amazon MWAA is a fully managed service that […]

How ENGIE scales their data ingestion pipelines using Amazon MWAA

ENGIE—one of the largest utility providers in France and a global player in the zero-carbon energy transition—produces, transports, and deals electricity, gas, and energy services. With 160,000 employees worldwide, ENGIE is a decentralized organization and operates 25 business units with a high level of delegation and empowerment. ENGIE’s decentralized global customer base had accumulated lots […]

Bolster security with role-based access control in Amazon MWAA

Amazon Studios invests in content that drives global growth of Amazon Prime Video and IMDb TV. Amazon Studios has a number of internal-facing applications that aim to streamline end-to-end business processes and information workflows for the entire content creation lifecycle. The Amazon Studios Data Infrastructure (ASDI) is a centralized, curated, and secure data lake that […]

Manage and process your big data workflows with Amazon MWAA and Amazon EMR on Amazon EKS

Many customers are gathering large amount of data, generated from different sources such as IoT devices, clickstream events from websites, and more. To efficiently extract insights from the data, you have to perform various transformations and apply different business logic on your data. These processes require complex workflow management to schedule jobs and manage dependencies […]

Orchestrate AWS Glue DataBrew jobs using Amazon Managed Workflows for Apache Airflow

As the industry grows with more data volume, big data analytics is becoming a common requirement in data analytics and machine learning (ML) use cases. Analysts are building complex data transformation pipelines that include multiple steps for data preparation and cleansing. However, analysts may want a simpler orchestration mechanism with a graphical user interface that […]