AWS Big Data Blog

Amazon EMR Serverless cost estimator

Amazon EMR Serverless is a serverless option in Amazon EMR that makes it easy for data analysts and engineers to run applications using open-source big data analytics frameworks such as Apache Spark and Hive without configuring, managing, and scaling clusters or servers. You get all the features of the latest open-source frameworks with the performance-optimized […]

Analyze real-time streaming data in Amazon MSK with Amazon Athena

Recent advances in ease of use and scalability have made streaming data easier to generate and use for real-time decision-making. Coupled with market forces that have forced businesses to react more quickly to industry changes, more and more organizations today are turning to streaming data to fuel innovation and agility. Amazon Managed Streaming for Apache […]

LaunchDarkly’s journey from ingesting 1 TB to 100 TB per day with Amazon Kinesis Data Streams

February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. This post was co-written with Mike Zorn, Software Architect at LaunchDarkly as the lead author. LaunchDarkly’s feature management platform enables customers to release features and measure their impact. As part of this […]

Migrate Google BigQuery to Amazon Redshift using AWS Schema Conversion tool (SCT)

Amazon Redshift is a fast, fully-managed, petabyte scale data warehouse that provides the flexibility to use provisioned or serverless compute for your analytical workloads. Using Amazon Redshift Serverless and Query Editor v2, you can load and query large datasets in just a few clicks and pay only for what you use. The decoupled compute and […]

Create, Train and Deploy Multi Layer Perceptron (MLP) models using Amazon Redshift ML

Amazon Redshift is a fully managed and petabyte-scale cloud data warehouse which is being used by tens of thousands of customers to process exabytes of data every day to power their analytics workloads. Amazon Redshift comes with a feature called Amazon Redshift ML which puts the power of machine learning in the hands of every […]

Secure your database credentials with AWS Secrets Manager and encrypt data with AWS KMS in Amazon QuickSight

Amazon QuickSight is a fully managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data, create interactive dashboards, and share them with tens of thousands of users, either directly within a QuickSight application, or embedded in web apps and portals. Let’s consider AnyCompany, which owns healthcare facilities across the country. […]

AWS Analytics sales team uses QuickSight Q to save hours creating monthly business reviews

The AWS Analytics sales team is a group of subject-matter experts who work to enable customers to become more data driven through the use of our native analytics services like Amazon Athena, Amazon Redshift, and Amazon QuickSight. Every month, each sales leader is responsible for reporting on observations and trends in their business. To support their […]

Amazon EMR launches support for Amazon EC2 M6A, R6A instances to improve cost performance for Spark workloads by 15–50% 

Amazon EMR provides a managed service to easily run analytics applications using open-source frameworks such as Apache Spark, Hive, Presto, Trino, HBase, and Flink. The Amazon EMR runtime for Spark and Presto includes optimizations that provide over 2x performance improvements over open-source Apache Spark and Presto. With Amazon EMR release 6.8, you can now use […]

Simplify private network access for solutions using Amazon OpenSearch Service managed VPC endpoints

Amazon OpenSearch Service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more. Amazon OpenSearch is an open source, distributed search and analytics suite. Amazon OpenSearch Service offers the latest versions of OpenSearch, support for 19 versions of Elasticsearch (1.5 to 7.10 versions), as well as visualization capabilities […]

Scale read and write workloads with Amazon Redshift

Amazon Redshift is a fast, fully managed, petabyte-scale cloud data warehouse that enables you to analyze large datasets using standard SQL. The concurrency scaling feature in Amazon Redshift automatically adds and removes capacity by adding concurrency scaling to handle demands from thousands of concurrent users, thereby providing consistent SLAs for unpredictable and spiky workloads such […]