Advanced (300) | AWS Big Data Blog

Automate email notifications for governance teams working with Amazon SageMaker Catalog

In this post, we show you how to create custom notifications for events occurring in SageMaker Catalog using Amazon EventBridge, AWS Lambda, and Amazon SNS. You can expand this solution to automatically integrate SageMaker Catalog with in-house enterprise workflow tools like ServiceNow and Helix.

Configure seamless single sign-on with SQL analytics in Amazon SageMaker Unified Studio

This post demonstrates how to configure SageMaker Unified Studio with SSO, set up projects and user onboarding, and access data securely using integrated analytics tools.

Best practices for upgrading from Amazon Redshift DC2 to RA3 and Amazon Redshift Serverless

As analytical demands grow, many customers are upgrading from DC2 to RA3 or Amazon Redshift Serverless, which offer independent compute and storage scaling, along with advanced capabilities such as data sharing, zero-ETL integration, and built-in artificial intelligence and machine learning (AI/ML) support with Amazon Redshift ML. This post provides a practical guide to plan your target architecture and migration strategy, covering upgrade options, key considerations, and best practices to facilitate a successful and seamless transition.

Building a real-time ICU patient analytics pipeline with AWS Lambda event source mapping

In this post, we demonstrate how to build a serverless architecture that processes real-time ICU patient monitoring data using Lambda event source mapping for immediate alert generation and data aggregation, followed by persistent storage in Amazon S3 with an Iceberg catalog for comprehensive healthcare analytics.

Best practices for migrating from Apache Airflow 2.x to Apache Airflow 3.x on Amazon MWAA

Apache Airflow 3.x on Amazon MWAA introduces architectural improvements such as API-based task execution that provides enhanced security and isolation. This migration presents an opportunity to embrace next-generation workflow orchestration capabilities while providing business continuity. This post provides best practices and a streamlined approach to successfully navigate this critical migration, providing minimal disruption to your mission-critical data pipelines while maximizing the enhanced capabilities of Airflow 3.

Breaking down data silos: Volkswagen’s approach with Amazon DataZone

In this post, we introduce Amazon DataZone and explore how Volkswagen used Amazon DataZone to build their data mesh, tackle the challenges encountered, and break the data silos.

Seamlessly Integrate Data on Google BigQuery and ClickHouse Cloud with AWS Glue

Migrating from Google Cloud’s BigQuery to ClickHouse Cloud on AWS allows businesses to leverage the speed and efficiency of ClickHouse for real-time analytics while benefiting from AWS’s scalable and secure environment. This article provides a comprehensive guide to executing a direct data migration using AWS Glue ETL, highlighting the advantages and best practices for a […]

Introducing Apache Airflow 3 on Amazon MWAA: New features and capabilities

AWS announced the general availability of Apache Airflow 3 on Amazon Managed Workflows for Apache Airflow (Amazon MWAA). This release transforms how organizations use Apache Airflow to orchestrate data pipelines and business processes in the cloud, bringing enhanced security, improved performance, and modern workflow orchestration capabilities to Amazon MWAA customers. This post explores the features of Airflow 3 on Amazon MWAA and outlines enhancements that improve your workflow orchestration capabilities

Scaling cluster manager and admin APIs in Amazon OpenSearch Service

In this post, we demonstrate the different bottlenecks that were identified and the corresponding solutions that were implemented in OpenSearch Service to scale cluster manager for large cluster deployments. These optimizations are available to all new domains or existing domains upgraded to OpenSearch Service versions 2.17 or above.

Amazon OpenSearch Serverless monitoring: A CloudWatch setup guide

In this post, we explore commonly used Amazon CloudWatch metrics and alarms for OpenSearch Serverless, walking through the process of selecting relevant metrics, setting appropriate thresholds, and configuring alerts. This guide will provide you with a comprehensive monitoring strategy that complements the serverless nature of your OpenSearch deployment while maintaining full operational visibility.

AWS Big Data Blog

Category: Advanced (300)