AWS Big Data Blog

Category: Analytics

Amazon EMR now provides up to 30% lower cost and up to 15% improved performance for Spark workloads on Graviton2-based instances

Amazon EMR now supports M6g, C6g and R6g instances with Amazon EMR versions 6.1.0, 5.31.0 and later. These instances are powered by AWS Graviton2 processors that are custom designed by AWS using 64-bit Arm Neoverse cores to deliver the best price performance for cloud workloads running in Amazon Elastic Compute Cloud (Amazon EC2). On Graviton2 […]

Building an ad-to-order conversion engine with Amazon Kinesis, AWS Glue, and Amazon QuickSight

August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Businesses in ecommerce have the challenge of measuring their ad-to-order conversion ratio for ads or promotional campaigns displayed on a webpage. Tracking the number of users that […]

Preparing data for ML models using AWS Glue DataBrew in a Jupyter notebook

AWS Glue DataBrew is a new visual data preparation tool that makes it easy for data analysts and data scientists to clean and normalize data to prepare it for analytics and machine learning (ML). In this post, we examine a sample ML use case and show how to use DataBrew and a Jupyter notebook to […]

Enabling self-service data publication to your data lake using AWS Glue DataBrew

Data lakes have been providing a level of flexibility to organizations unparalleled to anything before them. Having the ability to load and query data in place—and in its natural form—has led to an explosion of data lake deployments that have allowed organizations to accelerate against their data strategy faster than ever before. Most organizations have […]

Building a scalable streaming data processor with Amazon Kinesis Data Streams on AWS Fargate

Data is ubiquitous in businesses today, and the volume and speed of incoming data are constantly increasing. To derive insights from data, it’s essential to deliver it to a data lake or a data store and analyze it. Real-time or near-real-time data delivery can be cost prohibitive, therefore an efficient architecture is key for processing, […]

Controlling data lake access across multiple AWS accounts using AWS Lake Formation

When deploying data lakes on AWS, you can use multiple AWS accounts to better separate different projects or lines of business. In this post, we see how the AWS Lake Formation cross-account capabilities simplify securing and managing distributed data lakes across multiple accounts through a centralized approach, providing fine-grained access control to the AWS Glue […]

Amazon QuickSight: 2020 in review

As 2020 draws to a close, we’ve put together this post to walk you through all that’s changed in Amazon QuickSight this year. For your reading convenience, this post is broken up into the following sections: Embedded Analytics at scale Faster insights with Q & ML Business Intelligence (BI) with QuickSight Build Rich, Interactive Dashboards […]

Data monetization and customer experience optimization using telco data assets: Part 1

The landscape of the telecommunications industry is changing rapidly. For telecom service providers (TSPs), revenue from core voice and data services continues to shrink due to regulatory pressure and emerging OTT players that offer an attractive alternative. Despite increasing demand from customers for bandwidth, speed, and efficiency, TSPs are finding that ROI from implementing new […]

New in Amazon QuickSight – session capacity pricing for large scale deployments, embedding in public websites, and developer portal for embedded analytics

Amazon QuickSight Enterprise edition now offers a new, session capacity-based pricing model starting at $250/month, with annual commitment options that provide scalable pricing for embedded analytics and BI rollouts to 100s of 1000s of users. QuickSight now also supports embedding dashboards in apps, websites, and wikis without the need to provision and manage users (readers) […]

Keeping your data lake clean and compliant with Amazon Athena

With the introduction of CTAS support for Amazon Athena (see Use CTAS statements with Amazon Athena to reduce cost and improve performance), you can not only query but also create tables using Athena with the associated data objects stored in Amazon Simple Storage Service (Amazon S3). These tables are often temporary in nature and used […]