Amazon SageMaker Lakehouse

Simplify analytics and AI with a unified, open, and secure data lakehouse

What is SageMaker Lakehouse?

Amazon SageMaker Lakehouse unifies all your data across Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg–compatible tools and engines. Secure your data in the lakehouse by defining fine-grained permissions that are enforced across all analytics and machine learning (ML) tools and engines. Bring data from operational databases and applications into your lakehouse in near real time through zero-ETL integrations. Additionally, access and query data in-place with federated query capabilities across third-party data sources.

Benefits

Unify all your data across Amazon S3 data lakes and Amazon Redshift data warehouses with SageMaker Lakehouse. Bring your data from operational databases and applications to the lakehouse in near real time through zero-ETL integrations. You can use hundreds of connectors to integrate data from various sources. Additionally, you can access and query data in-place with federated query capabilities across third-party data sources.
Gain the flexibility to access and query your data in-place with all Apache Iceberg–compatible tools on a single copy of data. You can use analytics tools and engines of your choice, such as SQL, Apache Spark, business intelligence (BI), and AI/ML tools, and collaborate with data stored across Amazon S3 data lakes and Amazon Redshift data warehouses. Use SageMaker Lakehouse with your existing data architecture, allowing you to use your preferred storage formats and query engines, compatible with Apache Iceberg.
Secure your data with integrated, fine-grained access controls that are enforced across all your data in all analytics tools and engines. Define permissions once and confidently share data across your organization.

Use cases

Unify all your data across Amazon S3 data lakes and Amazon Redshift data warehouses for your analytics and AI initiatives with a single copy of data. With integrated access controls, SageMaker Lakehouse enables you to define fine-grained permissions and securely share a single copy of data across the entire organization.
Access near real-time data across operational databases and applications in the SageMaker Lakehouse through zero-ETL integrations. Access and query your data in-place, from a wide range of AWS services and open source and third-party tools and engines that support Apache Iceberg.
Bring existing data from multiple Amazon Redshift data warehouses into SageMaker Lakehouse to query and join data stored in Redshift clusters and workgroups. Scale your workloads for extract, transform, and load (ETL) processes, BI reporting, and as-needed analysis without managing multiple data shares.