AWS Big Data Blog

Process Apache Hudi, Delta Lake, Apache Iceberg datasets at scale, part 1: AWS Glue Studio Notebook

August 2023: This post was reviewed and updated for accuracy. AWS Glue supports native integration with Apache Hudi, Delta Lake, and Apache Iceberg. Refer to Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor to learn more. Cloud data lakes […]

How Plugsurfing doubled performance and reduced cost by 70% with purpose-built databases and AWS Graviton

Plugsurfing aligns the entire car charging ecosystem—drivers, charging point operators, and carmakers—within a single platform. The over 1 million drivers connected to the Plugsurfing Power Platform benefit from a network of over 300,000 charging points across Europe. Plugsurfing serves charging point operators with a backend cloud software for managing everything from country-specific regulations to providing […]

Migrate a large data warehouse from Greenplum to Amazon Redshift using AWS SCT – Part 2

In this second post of a multi-part series, we share best practices for choosing the optimal Amazon Redshift cluster, data architecture, converting stored procedures, compatible functions and queries widely used for SQL conversions, and recommendations for optimizing the length of data types for table columns. You can check out the first post of this series […]

Migrate a large data warehouse from Greenplum to Amazon Redshift using AWS SCT – Part 1

A data warehouse collects and consolidates data from various sources within your organization. It’s used as a centralized data repository for analytics and business intelligence. When working with on-premises legacy data warehouses, scaling the size of your data warehouse or improving performance can mean purchasing new hardware or adding more powerful hardware. This is often […]

Accelerate resizing of Amazon Redshift clusters with enhancements to classic resize

October 2023: This post was reviewed and updated to include the latest enhancements in Amazon Redshift’s resize feature. Amazon Redshift has improved the performance of the classic resize feature for multi-node RA3 clusters and increased the flexibility of the cluster snapshot restore operation. You can use the classic resize operation to resize a cluster when […]

Custom packages and hot reload of dictionary files with Amazon OpenSearch Service

Amazon OpenSearch Service is a fully managed service that you can use to deploy and operate OpenSearch clusters cost-effectively at scale in the AWS Cloud. The service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more by offering the latest versions of OpenSearch, support for 19 versions […]

Introducing Embedded Analytics Data Lab to accelerate integration of Amazon QuickSight analytics into applications

We are excited to announce Embedded Analytics Data Lab (EADL), a no-cost collaborative engagement that helps engineering and development teams cut down time required to launch applications with embedded analytics from Amazon QuickSight in production by providing hands-on guidance and architectural best practices. Embedding rich analytics such as interactive visuals and dashboards directly into applications […]

Optimize your Amazon Redshift query performance with automated materialized views

Amazon Redshift is a fast, fully managed cloud data warehouse database that makes it cost-effective to analyze your data using standard SQL and business intelligence tools. Amazon Redshift allows you to analyze structured and semi-structured data and seamlessly query data lakes and operational databases, using AWS designed hardware and automated machine learning (ML)-based tuning to […]

Achieve fine-grained data security with row-level access control in Amazon Redshift

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With Amazon Redshift, you can analyze all your data to derive holistic insights about your business and your customers. One of the challenges with security is that enterprises want to provide fine-grained access control at the row level for sensitive data. You […]

Use Amazon Athena parameterized queries to provide data as a service

Amazon Athena now provides you more flexibility to use parameterized queries, and we recommend you use them as the best practice for your Athena queries moving forward so you benefit from the security, reusability, and simplicity they offer. In a previous post, Improve reusability and security using Amazon Athena parameterized queries, we explained how parameterized […]