AWS Big Data Blog
Category: Intermediate (200)
Integrate ThoughtSpot with Amazon Redshift using AWS IAM Identity Center
In this post, we walk you through the process of setting up ThoughtSpot integration with Amazon Redshift using IAM Identity Center authentication. The solution provides a secure, streamlined analytics environment that empowers your team to focus on what matters most: discovering and sharing valuable business insights.
Correlate telemetry data with Amazon OpenSearch Service and Amazon Managed Grafana
In this post, we show you how to use Amazon OpenSearch Service and Amazon Managed Grafana to correlate the various observability signals that improve root cause analysis, thereby resulting in reduced Mean Time to Resolution (MTTR). We also provide a reference solution that can be used at scale for proactive monitoring of enterprise applications to avoid a problem before they occur.
Develop and test AWS Glue 5.0 jobs locally using a Docker container
In this post, we show how to develop and test AWS Glue 5.0 jobs locally using a Docker container. This post is an updated version of the post Develop and test AWS Glue version 3.0 and 4.0 jobs locally using a Docker container, and uses AWS Glue 5.0.
Unlock the power of optimization in Amazon Redshift Serverless
In this post, we demonstrate how Amazon Redshift Serverless AI-driven scaling and optimization impacts performance and cost across different optimization profiles.
Automate topic provisioning and configuration using Terraform with Amazon MSK
In this post, we address common challenges associated with manual MSK topic configuration management and present a robust Terraform-based solution. This solution supports both provisioned and serverless MSK clusters.
Amazon EMR 7.5 runtime for Apache Spark and Iceberg can run Spark workloads 3.6 times faster than Spark 3.5.3 and Iceberg 1.6.1
The Amazon EMR runtime for Apache Spark offers a high-performance runtime environment while maintaining 100% API compatibility with open source Apache Spark and Apache Iceberg table format. In this post, we demonstrate the performance benefits of using the Amazon EMR 7.5 runtime for Spark and Iceberg compared to open source Spark 3.5.3 with Iceberg 1.6.1 tables on the TPC-DS 3TB benchmark v2.13.
Run Apache XTable in AWS Lambda for background conversion of open table formats
In this post, we explore how Apache XTable, combined with the AWS Glue Data Catalog, enables background conversions between open table formats residing on Amazon S3-based data lakes, with minimal to no changes to existing pipelines, in a scalable and cost-effective way.
Run high-availability long-running clusters with Amazon EMR instance fleets
In this post, we demonstrate how to launch a high availability instance fleet cluster using the newly redesigned Amazon EMR console, as well as using an AWS CloudFormation template. We also go over the basic concepts of Hadoop high availability, EMR instance fleets, the benefits and trade-offs of high availability, and best practices for running resilient EMR clusters.
Enrich your AWS Glue Data Catalog with generative AI metadata using Amazon Bedrock
By harnessing the capabilities of generative AI, you can automate the generation of comprehensive metadata descriptions for your data assets based on their documentation, enhancing discoverability, understanding, and the overall data governance within your AWS Cloud environment. This post shows you how to enrich your AWS Glue Data Catalog with dynamic metadata using foundation models (FMs) on Amazon Bedrock and your data documentation.
How Volkswagen Autoeuropa built a data solution with a robust governance framework, simplifying access to quality data using Amazon DataZone
This second post of a two-part series that details how Volkswagen Autoeuropa, a Volkswagen Group plant, together with AWS, built a data solution with a robust governance framework using Amazon DataZone to become a data-driven factory. Part 1 of this series focused on the customer challenges, overall solution architecture and solution features, and how they helped Volkswagen Autoeuropa overcome their challenges. This post dives into the technical details, highlighting the robust data governance framework that enables ease of access to quality data using Amazon DataZone.









