AWS Big Data Blog
Category: Analytics
Introducing AWS Glue Data Catalog usage metrics for API usage
We’re excited to announce AWS Glue Data Catalog usage metrics. The usage metrics is a new feature that provides native integration with Amazon CloudWatch. In this post, we demonstrate how to access these metrics, provide a step-by-step walkthrough, and set up meaningful alarms.
Amazon OpenSearch Service 101: Create your first search application with OpenSearch
In this post, we walk you through a search application building process using Amazon OpenSearch Service. Whether you’re a developer new to search or looking to understand OpenSearch fundamentals, this hands-on post shows you how to build a search application from scratch—starting with the initial setup; diving into core components such as indexing, querying, result presentation; and culminating in the execution of your first search query.
Implement secure hybrid and multicloud log ingestion with Amazon OpenSearch Ingestion
In this post, we demonstrate how to configure Fluent Bit, a fast and flexible log processor and router supported by various operating systems, to securely send logs from any environment to OpenSearch Ingestion using IAM Roles Anywhere.
Capture data lineage from dbt, Apache Airflow, and Apache Spark with Amazon SageMaker
This post walks you through how to use the OpenLineage-compatible API of SageMaker or Amazon DataZone to push data lineage events programmatically from tools supporting the OpenLineage standard like dbt, Apache Airflow, and Apache Spark.
How Skroutz handles real-time schema evolution in Amazon Redshift with Debezium
Skroutz chose Amazon Redshift to promote data democratization, empowering teams across the organization with seamless access to data, enabling faster insights and more informed decision-making. In this post, we share how we handled real-time schema evolution in Amazon Redshift with Debezium.
Stream data from Amazon MSK to Apache Iceberg tables in Amazon S3 and Amazon S3 Tables using Amazon Data Firehose
In this post, we walk through two solutions that demonstrate how to stream data from your Amazon MSK provisioned cluster to Iceberg-based data lakes in Amazon S3 using Amazon Data Firehose.
Secure access to a cross-account Amazon MSK cluster from Amazon MSK Connect using IAM authentication
In this post, we demonstrate a use case where you might need to use an MSK cluster in one AWS account, but MSK Connect is located in a separate account. We demonstrate how to implement IAM authentication after establishing network connectivity. IAM provides enhanced security measures, making sure your systems are protected against unauthorized access.
Build a multi-Region analytics solution with Amazon Redshift, Amazon S3, and Amazon QuickSight
This post explores how to effectively architect a solution that addresses this specific challenge: enabling comprehensive analytics capabilities for global teams while making sure that your data remains in the AWS Regions required by your compliance framework. We use a variety of AWS services, including Amazon Redshift, Amazon Simple Storage Service (Amazon S3), and Amazon QuickSight.
RocksDB 101: Optimizing stateful streaming in Apache Spark with Amazon EMR and AWS Glue
This post explores RocksDB’s key features and demonstrates its implementation using Spark on Amazon EMR and AWS Glue, providing you with the knowledge you need to scale your real-time data processing capabilities.
Reduce time to access your transactional data for analytical processing using the power of Amazon SageMaker Lakehouse and zero-ETL
In this post, we demonstrate how you can bring transactional data from AWS OLTP data stores like Amazon Relational Database Service (Amazon RDS) and Amazon Aurora flowing into Redshift using zero-ETL integrations to SageMaker Lakehouse Federated Catalog (Bring your own Amazon Redshift into SageMaker Lakehouse). With this integration, you can now seamlessly onboard the changed data from OLTP systems to a unified lakehouse and expose the same to analytical applications for consumptions using Apache Iceberg APIs from new SageMaker Unified Studio.









