Learning Levels | AWS Big Data Blog

How LaunchDarkly migrated to Amazon MWAA to achieve efficiency and scale

In this post, we explore how LaunchDarkly scaled the internal analytics platform up to 14,000 tasks per day, with minimal increase in costs, after migrating from another vendor-managed Apache Airflow solution to AWS, using Amazon Managed Workflows for Apache Airflow (Amazon MWAA) and Amazon Elastic Container Service (Amazon ECS).

Zero-copy, Coordination-free approach to OpenSearch Snapshots

In this blog post, we tell you how we enhanced the snapshot efficiency in Amazon OpenSearch Service while carefully maintaining these critical operational aspects. These snapshot optimizations are enabled for all OpenSearch optimized instance family (OR1, OR2, OM2) domains from version 2.17 onwards.

Configure cross-account access of Amazon SageMaker Lakehouse multi-catalog tables using AWS Glue 5.0 Spark

In this post, we show you how to share an Amazon Redshift table and Amazon S3 based Iceberg table from the account that owns the data to another account that consumes the data. In the recipient account, we run a join query on the shared data lake and data warehouse tables using Spark in AWS Glue 5.0. We walk you through the complete cross-account setup and provide the Spark configuration in a Python notebook.

Automate replication of row-level security from AWS Lake Formation to Amazon QuickSight

This post outlines a solution to automatically replicate the entitlements for readers from the source (AWS Lake Formation) to Amazon QuickSight. This solution can be used even when the authentication method in Amazon QuickSight is not using IAM Identity Center and can work with both direct query and SPICE datasets in Amazon QuickSight.

Build end-to-end Apache Spark pipelines with Amazon MWAA, Batch Processing Gateway, and Amazon EMR on EKS clusters

This post shows how to enhance the multi-cluster solution by integrating Amazon Managed Workflows for Apache Airflow (Amazon MWAA) with BPG. By using Amazon MWAA, we add job scheduling and orchestration capabilities, enabling you to build a comprehensive end-to-end Spark-based data processing pipeline.

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

Natural Intelligence (NI) is a world leader in multi-category marketplaces. In this blog post, NI shares their journey, the innovative solutions developed, and the key takeaways that can guide other organizations considering a similar path. This article details NI’s practical approach to this complex migration, focusing less on Apache Iceberg’s technical specifications, but rather on the real-world challenges and solutions encountered during the transition to Apache Iceberg, a challenge that many organizations are grappling with.

Read and write Apache Iceberg tables using AWS Lake Formation hybrid access mode

In this post, we demonstrate how to use Lake Formation for read access while continuing to use AWS Identity and Access Management (IAM) policy-based permissions for write workloads that update the schema and upsert (insert and update combined) data records into the Iceberg tables.

Integrate ThoughtSpot with Amazon Redshift using AWS IAM Identity Center

In this post, we walk you through the process of setting up ThoughtSpot integration with Amazon Redshift using IAM Identity Center authentication. The solution provides a secure, streamlined analytics environment that empowers your team to focus on what matters most: discovering and sharing valuable business insights.

Optimize multimodal search using the TwelveLabs Embed API and Amazon OpenSearch Service

In this blog post, we show you the process of integrating TwelveLabs Embed API with OpenSearch Service to create a multimodal search solution. You’ll learn how to generate rich, contextual embeddings from video content and use OpenSearch Service’s vector database capabilities to enable search functionalities. By the end of this post, you’ll be equipped with the knowledge to implement a system that can transform the way your organization handles and extracts value from video content.

Correlate telemetry data with Amazon OpenSearch Service and Amazon Managed Grafana

In this post, we show you how to use Amazon OpenSearch Service and Amazon Managed Grafana to correlate the various observability signals that improve root cause analysis, thereby resulting in reduced Mean Time to Resolution (MTTR). We also provide a reference solution that can be used at scale for proactive monitoring of enterprise applications to avoid a problem before they occur.

AWS Big Data Blog

Category: Learning Levels