AWS Big Data Blog
Category: *Post Types
Introducing Point in Time queries and SQL/PPL support in Amazon OpenSearch Serverless
Today we announced support for three new features for Amazon OpenSearch Serverless: Point in Time (PIT) search, which enables you to maintain stable sorting for deep pagination in the presence of updates, and PPL and SQL, which give you new ways to query your data. In this post, we discuss the benefits of these new features and how to get started.
Introducing Amazon MWAA micro environments for Apache Airflow
Today, we’re excited to announce mw1.micro, the latest addition to Amazon MWAA environment classes. This offering is designed to provide an even more cost-effective solution for running Airflow environments in the cloud. With mw1.micro, we’re bringing the power of Amazon MWAA to teams who require a lightweight environment without compromising on essential features. In this post, we’ll explore mw1.micro characteristics, key benefits, ideal use cases, and how you can set up an Amazon MWAA environment based on this new environment class.
Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito
In this post, we show you how to manage user access to enterprise documents in generative AI-powered tools according to the access you assign to each persona. This post illustrates how to build a document search RAG solution that makes sure only authorized users can access and interact with specific documents based on their roles, departments, and other relevant attributes. It combines OpenSearch Service and Amazon Cognito custom attributes to make a tag-based access control mechanism that makes it straightforward to manage at scale.
Enrich your AWS Glue Data Catalog with generative AI metadata using Amazon Bedrock
By harnessing the capabilities of generative AI, you can automate the generation of comprehensive metadata descriptions for your data assets based on their documentation, enhancing discoverability, understanding, and the overall data governance within your AWS Cloud environment. This post shows you how to enrich your AWS Glue Data Catalog with dynamic metadata using foundation models (FMs) on Amazon Bedrock and your data documentation.
How FINRA established real-time operational observability for Amazon EMR big data workloads on Amazon EC2 with Prometheus and Grafana
FINRA performs big data processing with large volumes of data and workloads with varying instance sizes and types on Amazon EMR. Amazon EMR is a cloud-based big data environment designed to process large amounts of data using open source tools such as Hadoop, Spark, HBase, Flink, Hudi, and Presto. In this post, we talk about our challenges and show how we built an observability framework to provide operational metrics insights for big data processing workloads on Amazon EMR on Amazon Elastic Compute Cloud (Amazon EC2) clusters.
Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service
These organizations use third-party satellite-powered terminal devices for remote monitoring using telemetry and NMEA-0183 formatted messages generated in near real time. This post demonstrates how to implement a satellite-based remote alerting and response solution on the AWS Cloud to provide time-critical alerts and actionable insights, with a focus on telemetry message ingestion and alerts. Key services in the solution include Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service.
How Volkswagen Autoeuropa built a data solution with a robust governance framework, simplifying access to quality data using Amazon DataZone
This second post of a two-part series that details how Volkswagen Autoeuropa, a Volkswagen Group plant, together with AWS, built a data solution with a robust governance framework using Amazon DataZone to become a data-driven factory. Part 1 of this series focused on the customer challenges, overall solution architecture and solution features, and how they helped Volkswagen Autoeuropa overcome their challenges. This post dives into the technical details, highlighting the robust data governance framework that enables ease of access to quality data using Amazon DataZone.
Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore
This post focuses on introducing an active-passive approach using a snapshot and restore strategy. The snapshot and restore strategy in OpenSearch Service involves creating point-in-time backups, known as snapshots, of your OpenSearch domain. These snapshots capture the entire state of the domain, including indexes, mappings, and settings. In the event of data loss or system failure, these snapshots will be used to restore the domain to a specific point in time. The post walks through the steps to set up this disaster recovery solution, including launching OpenSearch Service domains in primary and secondary regions, configuring snapshot repositories, restoring snapshots, and failing over/failing back between the regions.
Amazon OpenSearch Service announces Standard and Extended Support dates for Elasticsearch and OpenSearch versions
Today, we’re announcing timelines for end of Standard Support and Extended Support for legacy Elasticsearch versions up to 6.7, Elasticsearch versions 7.1 through 7.8, OpenSearch versions from 1.0 through 1.2, and OpenSearch versions 2.3 through 2.9 available on Amazon OpenSearch Service.
How Volkswagen Autoeuropa built a data mesh to accelerate digital transformation using Amazon DataZone
In this post, we discuss how Volkswagen Autoeuropa used Amazon DataZone to build a data marketplace based on data mesh architecture to accelerate their digital transformation. The data mesh, built on Amazon DataZone, simplified data access, improved data quality, and established governance at scale to power analytics, reporting, AI, and machine learning (ML) use cases. As a result, the data solution offers benefits such as faster access to data, expeditious decision making, accelerated time to value for use cases, and enhanced data governance.