AWS Big Data Blog
Category: Announcements
Introducing Terraform support for Amazon OpenSearch Ingestion
Today, we are launching Terraform support for Amazon OpenSearch Ingestion. Terraform is an infrastructure as code (IaC) tool that helps you build, deploy, and manage cloud resources efficiently. OpenSearch Ingestion is a fully managed, serverless data collector that delivers real-time log, metric, and trace data to Amazon OpenSearch Service domains and Amazon OpenSearch Serverless collections. […]
Introducing Amazon MWAA support for Apache Airflow version 2.8.1
Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that makes it straightforward to set up and operate end-to-end data pipelines in the cloud. Organizations use Amazon MWAA to enhance their business workflows. For example, C2i Genomics uses Amazon MWAA in their data platform to orchestrate the validation […]
Improve your ETL performance using multiple Redshift warehouses to write to your data sets
Now, at Amazon Redshift, we are announcing the general availability of multi-data warehouse writes through data sharing. This new capability allows you to achieve better performance for extract, transform, and load (ETL) workloads by using different warehouses of different types and sizes based on your workload needs.
Track Amazon OpenSearch Service configuration changes more easily with new visibility improvements
Amazon OpenSearch Service offers multiple domain configuration settings to meet your workload-specific requirements. As part of standard service operations, you may be required to update these configuration settings on a regular basis. Recently, Amazon OpenSearch Service launched visibility improvements that allow you to track configuration changes more effectively. We’ve introduced granular and more descriptive configuration […]
Amazon OpenSearch Service search enhancements: 2023 roundup
What users expect from search engines has evolved over the years. Just returning lexically relevant results quickly is no longer enough for most users. Now users seek methods that allow them to get even more relevant results through semantic understanding or even search through image visual similarities instead of textual search of metadata. Amazon OpenSearch […]
New Amazon CloudWatch log class to cost-effectively scale your AWS Glue workloads
AWS Glue is a serverless data integration service that makes it easier to discover, prepare, and combine data for analytics, machine learning (ML), and application development. You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores. One of […]
Use Amazon EMR with S3 Access Grants to scale Spark access to Amazon S3
Amazon EMR is pleased to announce integration with Amazon Simple Storage Service (Amazon S3) Access Grants that simplifies Amazon S3 permission management and allows you to enforce granular access at scale. With this integration, you can scale job-based Amazon S3 access for Apache Spark jobs across all Amazon EMR deployment options and enforce granular Amazon […]
Large Language Models for sentiment analysis with Amazon Redshift ML (Preview)
Amazon Redshift ML empowers data analysts and database developers to integrate the capabilities of machine learning and artificial intelligence into their data warehouse. Amazon Redshift ML helps to simplify the creation, training, and application of machine learning models through familiar SQL commands. You can further enhance Amazon Redshift’s inferencing capabilities by Bringing Your Own Models […]
Introducing Apache Hudi support with AWS Glue crawlers
Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance. Data engineers use Apache Hudi for streaming workloads as well as to create efficient incremental data pipelines. Hudi provides tables, transactions, efficient […]
Introducing persistent buffering for Amazon OpenSearch Ingestion
Amazon OpenSearch Ingestion is a fully managed, serverless pipeline that delivers real-time log, metric, and trace data to Amazon OpenSearch Service domains and OpenSearch Serverless collections. Customers use Amazon OpenSearch Ingestion pipelines to ingest data from a variety of data sources, both pull-based and push-based. When ingesting data from pull-based sources, such as Amazon Simple […]









