Lakehouse Architecture

Features of the lakehouse architecture

General
10

General

Open all

Materialized views are managed Apache Iceberg tables in the AWS Glue Data Catalog that accelerate data lake query performance from Spark by up to 8X. These views store precomputed results of queries in Iceberg that automatically update as underlying data changes, eliminating the need to build and maintain complex data pipelines.

The AWS Glue Data Catalog supports deletion vectors and row lineage as defined in the Apache Iceberg V3 specification. These Iceberg V3 capabilities will help you build petabyte-scale data lakes with improved performance for data modifications and functionality to track changed records.

Access Iceberg tables stored in Amazon S3 and registered in remote catalogs directly from AWS analytics engines- securely and cost-effectively through catalog federation in AWS Glue Data Catalog.

Gain the flexibility to access and query your data in-place, with any Apache Iceberg–compatible tools and engines of your choice.

Run analytics and ML workloads - including Apache Spark jobs, SQL dashboards, ML models, and generative AI applications - on a single copy of data, storing it in the format best suited for your workloads.

With Apache Iceberg compatibility, all data is fully ACID (Atomic, Consistent, Isolated, Durable) compliant for high-performance SQL analytics.

Run federated queries on data stored across multiple third-party sources such as Google BigQuery, SQL Server, and Snowflake to access and query your data in-place.

Get the flexibility of a data lake and performance of a data warehouse, without changing your existing data architecture. Access highly optimized Amazon Redshift storage and secondary data structures, such as materialized views, to speed up SQL analytics in your data lakes.

Bring data from your operational databases such as Amazon DynamoDB, Amazon Aurora MySQL, Amazon Aurora PostgreSQL, Amazon RDS for MySQL and applications including Salesforce, ServiceNow, and Zendesk to the lakehouse using zero-ETL integrations for near real-time analytics.

Define fine-grained permissions once and have them enforced across all your data in all analytic tools and engines.

Next steps

FAQs

Get answers to frequently asked questions

Read the FAQs

Pricing

Explore lakehouse pricing

Learn more about pricing

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages

Features of the lakehouse architecture

General

Next steps

Get answers to frequently asked questions

Explore lakehouse pricing

Did you find what you were looking for today?

Learn

Resources

Developers

Help

Features of the lakehouse architecture

General

Accelerate data lake query performance with materialized views

Apache Iceberg V3 support with the AWS Glue Data Catalog

Access remote Apache Iceberg catalogs through catalog federation

Open data access with Apache Iceberg REST Catalog APIs

Run analytics and ML workloads on a single copy of data

Fully ACID-compliant storage

Federated data queries

Access to Amazon Redshift storage features

Zero-ETL integration for near real-time analytics

Integrated access controls

Next steps

Get answers to frequently asked questions

Explore lakehouse pricing

Did you find what you were looking for today?

Learn

Resources

Developers

Help