AWS Big Data Blog

Category: Analytics

Reduce Amazon EMR cluster costs by up to 19% with new enhancements in Amazon EMR Managed Scaling

In June 2020, AWS announced the general availability of Amazon EMR Managed Scaling. With EMR Managed Scaling, you specify the minimum and maximum compute limits for your clusters, and Amazon EMR automatically resizes your cluster for optimal performance and resource utilization. EMR Managed Scaling constantly monitors key workload-related metrics and uses an algorithm that optimizes the […]

Simplify Online Analytical Processing (OLAP) queries in Amazon Redshift using new SQL constructs such as ROLLUP, CUBE, and GROUPING SETS

Amazon Redshift is a fully managed, petabyte-scale, massively parallel data warehouse that makes it fast, simple, and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. We are continuously investing to make analytics easy with Redshift by simplifying SQL constructs and adding new operators. Now we are adding […]

Patterns for enterprise data sharing at scale

Data sharing is becoming an important element of an enterprise data strategy. AWS services like AWS Data Exchange provide an avenue for companies to share or monetize their value-added data with other companies. Some organizations would like to have a data sharing platform where they can establish a collaborative and strategic approach to exchange data […]

Build a real-time GDPR-aligned Apache Iceberg data lake

Data lakes are a popular choice for today’s organizations to store their data around their business activities. As a best practice of a data lake design, data should be immutable once stored. But regulations such as the General Data Protection Regulation (GDPR) have created obligations for data operators who must be able to erase or […]

Introducing AWS Glue crawlers using AWS Lake Formation permission management

Data lakes provide a centralized repository that consolidates your data at scale and makes it available for different kinds of analytics. AWS Glue crawlers are a popular way to scan data in a data lake, classify it, extract schema information from it, and store the metadata automatically in the AWS Glue Data Catalog. AWS Lake […]

Configure ADFS Identity Federation with Amazon QuickSight

Amazon QuickSight Enterprise edition can integrate with your existing Microsoft Active Directory (AD), providing federated access using Security Assertion Markup Language (SAML) to dashboards. Using existing identities from Active Directory eliminates the need to create and manage separate user identities in AWS Identity Access Management (IAM). Federated users assume an IAM role when access is requested through an […]

How Vanguard made their technology platform resilient and efficient by building cross-Region replication for Amazon Kinesis Data Streams

This is a guest post co-written with Raghu Boppanna from Vanguard.  At Vanguard, the Enterprise Advice line of business improves investor outcomes through digital access to superior, personalized, and affordable financial advice. They made it possible, in part, by driving economies of scale across the globe for investors with a highly resilient and efficient technical […]

Control access to Amazon OpenSearch Service Dashboards with attribute-based role mappings

Federated users of Amazon OpenSearch Service often need access to OpenSearch Dashboards with roles based on their user profiles. OpenSearch Service fine-grained access control maps authenticated users to OpenSearch Search roles and then evaluates permissions to determine how to handle the user’s actions. However, when an enterprise-wide identity provider (IdP) manages the users, the mapping […]

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

This post is co-written with Olivia Michele and Dariswan Janweri P. at Ruparupa. Ruparupa was built by PT. Omni Digitama Internusa with the vision to cultivate synergy and create a seamless digital ecosystem within Kawan Lama Group that touches and enhances the lives of many. Ruparupa is the first digital platform built by Kawan Lama […]

A hybrid approach in healthcare data warehousing with Amazon Redshift

Data warehouses play a vital role in healthcare decision-making and serve as a repository of historical data. A healthcare data warehouse can be a single source of truth for clinical quality control systems. Data warehouses are mostly built using the dimensional model approach, which has consistently met business needs. Loading complex multi-point datasets into a […]