AWS Big Data Blog

Category: Management Tools

Auto scaling Amazon Kinesis Data Streams using Amazon CloudWatch and AWS Lambda

This post is co-written with Noah Mundahl, Director of Public Cloud Engineering at United Health Group. Update (12/1/2021): Amazon Kinesis Data Streams On-Demand mode is now the recommended way to natively auto scale your Amazon Kinesis Data Streams. In this post, we cover a solution to add auto scaling to Amazon Kinesis Data Streams. Whether […]

Increase Amazon Elasticsearch Service performance by upgrading to Graviton2

September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Amazon OpenSearch Service supports multiple instance types based on your use case. In 2021, AWS announced general purpose (M6g), compute optimized (C6g), and memory optimized (R6g, R6gd) instance types for Amazon OpenSearch Service version 7.9 or later powered by AWS […]

The following diagram illustrates this architecture.

Run usage analytics on Amazon QuickSight using AWS CloudTrail

Amazon QuickSight is a cloud-native BI service that allows end users to create and publish dashboards in minutes, without provisioning any servers or requiring complex licensing. You can view these dashboards on the QuickSight product console or embed them into applications and websites. After you deploy a dashboard, it’s important to assess how they and […]

Data monetization and customer experience optimization using telco data assets: Part 2

Part 1 of this series explains the importance of building and implementing a customer experience (CX) management and data monetization strategy for telecom service providers (TSPs), and the major challenges driving these initiatives. It also includes an AWS CloudFormation template to set up a demonstration of the solution using AWS services. It covers transforming and enriching […]

The following diagram illustrates a high-level integration of AWS Systems Manager interaction with an EMR cluster.

Securing access to EMR clusters using AWS Systems Manager

Organizations need to secure infrastructure when enabling access to engineers to build applications. Opening SSH inbound ports on instances to enable engineer access introduces the risk of a malicious entity running unauthorized commands. Using a Bastion host or jump server is a common approach used to allow engineer access to Amazon EMR cluster instances by […]

The following diagram illustrates the solution architecture.

Accelerating Amazon Redshift federated query to Amazon Aurora MySQL with AWS CloudFormation

Amazon Redshift federated query allows you to combine data from one or more Amazon Relational Database Service (Amazon RDS) for MySQL and Amazon Aurora MySQL databases with data already in Amazon Redshift. You can also combine such data with data in an Amazon Simple Storage Service (Amazon S3) data lake. This post shows you how […]

Enhanced monitoring and automatic scaling for Apache Flink

Thousands of developers use Apache Flink to build streaming applications to transform and analyze data in real time. Apache Flink is an open-source framework and engine for processing data streams. It’s highly available and scalable, delivering high throughput and low latency for the most demanding stream-processing applications. Monitoring and scaling your applications is critical to […]

Build a self-service environment for each line of business using Amazon EMR and AWS Service Catalog

Enterprises often want to centralize governance and compliance requirements, and provide a common set of policies on how Amazon EMR instances should be set up. You can use AWS Service Catalog to centrally manage commonly deployed Amazon EMR cluster configurations, and this helps you achieve consistent governance and meet your compliance requirements, while at the […]

Deploy an Amazon EMR edge node with RStudio using AWS Systems Manager

October 2022: The technique of automating creation of an edge node mentioned in this blog post is now obsolete. We recommend utilizing the fully managed RStudio on Amazon SageMaker offering instead.  RStudio is an integrated development environment (IDE) for R, a language and environment for statistical computing and graphics. As a data scientist, you may […]