AWS Big Data Blog

Category: Technical How-to

Optimize write throughput for Amazon Kinesis Data Streams

Amazon Kinesis Data Streams is used by many customers to capture, process, and store data streams at any scale. This level of unparalleled scale is enabled by dividing each data stream into multiple shards. Each shard in a stream has a 1 Mbps or 1,000 records per second write throughput limit. Whether your data streaming […]

Implement a full stack serverless search application using AWS Amplify, Amazon Cognito, Amazon API Gateway, AWS Lambda, and Amazon OpenSearch Serverless

Designing a full stack search application requires addressing numerous challenges to provide a smooth and effective user experience. This encompasses tasks such as integrating diverse data from various sources with distinct formats and structures, optimizing the user experience for performance and security, providing multilingual support, and optimizing for cost, operations, and reliability. Amazon OpenSearch Serverless […]

Simplify data lake access control for your enterprise users with trusted identity propagation in AWS IAM Identity Center, AWS Lake Formation, and Amazon S3 Access Grants

Many organizations use external identity providers (IdPs) such as Okta or Microsoft Azure Active Directory to manage their enterprise user identities. These users interact with and run analytical queries across AWS analytics services. To enable them to use the AWS services, their identities from the external IdP are mapped to AWS Identity and Access Management […]

Introducing Amazon EMR on EKS with Apache Flink: A scalable, reliable, and efficient data processing platform

AWS recently announced that Apache Flink is generally available for Amazon EMR on Amazon Elastic Kubernetes Service (EKS). Apache Flink is a scalable, reliable, and efficient data processing framework that handles real-time streaming and batch workloads (but is most commonly used for real-time streaming). Amazon EMR on EKS is a deployment option for Amazon EMR […]

Architectural Patterns for Real Time Analytics using Amazon Kinesis Data Streams, Part 2 – AI Applications

Architectural Patterns for real-time analytics using Amazon Kinesis Data Streams, Part 2: AI Applications

Welcome back to our exciting exploration of architectural patterns for real-time analytics with Amazon Kinesis Data Streams! In this fast-paced world, Kinesis Data Streams stands out as a versatile and robust solution to tackle a wide range of use cases with real-time data, from dashboarding to powering artificial intelligence (AI) applications. In this series, we […]

In-place version upgrades for applications on Amazon Managed Service for Apache Flink now supported

Managed Service for Apache Flink is a fully managed, serverless experience in running Apache Flink applications, and now supports Apache Flink 1.18.1, the latest released version of Apache Flink at the time of writing. In this post, we explore in-place version upgrades, a new feature offered by Managed Service for Apache Flink. We provide guidance on getting started and offer detailed insights into the feature. Later, we deep dive into how the feature works and some sample use cases.

Use AWS Data Exchange to seamlessly share Apache Hudi datasets

Apache Hudi was originally developed by Uber in 2016 to bring to life a transactional data lake that could quickly and reliably absorb updates to support the massive growth of the company’s ride-sharing platform. Apache Hudi is now widely used to build very large-scale data lakes by many across the industry. Today, Hudi is the […]

Achieve peak performance and boost scalability using multiple Amazon Redshift serverless workgroups and Network Load Balancer

As data analytics use cases grow, factors of scalability and concurrency become crucial for businesses. Your analytic solution architecture should be able to handle large data volumes at high concurrency and without compromising speed, thereby delivering a scalable high-performance analytics environment. Amazon Redshift Serverless provides a fully managed, petabyte-scale, auto scaling cloud data warehouse to […]

Use AWS Glue Data Catalog views to analyze data

In this post, we show you how to use the new views feature the AWS Glue Data Catalog. SQL views are a powerful object used across relational databases. You can use views to decrease the time to insights of data by tailoring the data that is queried. Additionally, you can use the power of SQL […]