AWS Big Data Blog
Scale Amazon QuickSight embedded analytics with new API-based domain allow listing
Amazon QuickSight is a fully-managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data, create interactive dashboards, and share these with tens of thousands of users, either within QuickSight itself or embedded in apps and portals. QuickSight Enterprise Edition recently introduced the ability to dynamically allow list the domains where […]
Create a most-recent view of your data lake using Amazon Redshift Serverless
Building a robust data lake is very beneficial because it enables organizations have a holistic view of their business and empowers data-driven decisions. The curated layer of a data lake is able to hydrate multiple homogeneous data products, unlocking limitless capabilities to address current and future requirements. However, some concepts of how data lakes work […]
How SumUp built a low-latency feature store using Amazon EMR and Amazon Keyspaces
This post was co-authored by Vadym Dolin, Data Architect at SumUp. In their own words, SumUp is a leading financial technology company, operating across 35 markets on three continents. SumUp helps small businesses be successful by enabling them to accept card payments in-store, in-app, and online, in a simple, secure, and cost-effective way. Today, SumUp […]
Process Apache Hudi, Delta Lake, Apache Iceberg dataset at scale, part 2: Using AWS Glue Studio Visual Editor
June 2023: This post was reviewed and updated for accuracy. AWS Glue supports native integration with Apache Hudi, Delta Lake, and Apache Iceberg. Refer to Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor to learn more. Transactional data lake […]
Simplify analytics on Amazon Redshift using PIVOT and UNPIVOT
Amazon Redshift is a fast, fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Many customers look to build their data warehouse on Amazon Redshift, and they have many requirements where they want to convert data from row […]
Stream Amazon EMR on EKS logs to third-party providers like Splunk, Amazon OpenSearch Service, or other log aggregators
Spark jobs running on Amazon EMR on EKS generate logs that are very useful in identifying issues with Spark processes and also as a way to see Spark outputs. You can access these logs from a variety of sources. On the Amazon EMR virtual cluster console, you can access logs from the Spark History UI. […]
Integrate Amazon Redshift row-level security with Amazon Redshift native IdP authentication
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers. As enterprise customers look to build their data […]
Enable federated governance using Trino and Apache Ranger on Amazon EMR
Managing data through a central data platform simplifies staffing and training challenges and reduces the costs. However, it can create scaling, ownership, and accountability challenges, because central teams may not understand the specific needs of a data domain, whether it’s because of data types and storage, security, data catalog requirements, or specific technologies needed for […]
Process Apache Hudi, Delta Lake, Apache Iceberg datasets at scale, part 1: AWS Glue Studio Notebook
August 2023: This post was reviewed and updated for accuracy. AWS Glue supports native integration with Apache Hudi, Delta Lake, and Apache Iceberg. Refer to Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor to learn more. Cloud data lakes […]
How Plugsurfing doubled performance and reduced cost by 70% with purpose-built databases and AWS Graviton
Plugsurfing aligns the entire car charging ecosystem—drivers, charging point operators, and carmakers—within a single platform. The over 1 million drivers connected to the Plugsurfing Power Platform benefit from a network of over 300,000 charging points across Europe. Plugsurfing serves charging point operators with a backend cloud software for managing everything from country-specific regulations to providing […]









