Lakehouse Architecture

The lakehouse architecture of Amazon SageMaker

Simplify analytics and AI with a unified, open, and secure data architecture

Overview

The next generation of Amazon SageMaker is built on an open lakehouse architecture, fully compatible with Apache Iceberg. Unify all data across Amazon Simple Storage Service (Amazon S3) data lakes, including S3 Tables, and Amazon Redshift data warehouses, to build powerful analytics and AI/ML applications on a single copy of data. Connect data from additional sources through zero-ETL integrations with operational databases and applications, query federation with data sources, and catalog federation for remote Apache Iceberg tables. Get the flexibility to access and query your data in-place with all Iceberg–compatible tools and engines. Secure your data by defining integrated access controls that are enforced across all analytics and machine learning (ML) tools and engines.

See it in action

See how you can access unified data from S3 data lakes, S3 Tables, and Redshift data warehouses in an open and secure data lakehouse.

Try the demo

Benefits

Unify all data across Amazon S3 data lakes, including S3 Tables, and Amazon Redshift data warehouses. Bring your data from operational databases and applications to the lakehouse in near real time through zero-ETL integrations. Access and query data in-place across third-party data sources, through query federation capabilities. Furthermore, get direct, secure and cost-efficient access to Apache Iceberg tables stored in S3 and registered in remote catalogs, from AWS analytics engines through catalog federation.

Gain the flexibility to access and query your data in-place with all Apache Iceberg–compatible analytical tools and engines, such as SQL, Apache Spark, business intelligence (BI), and AI/ML tools to access unified data in your lakehouse.

Secure all data with integrated, fine-grained access controls at the table, column, or cell level, and enforce those permissions across all your analytics tools and engines. Use tag-based, attribute-based, or role-based access policies to match your security requirements. Share data across your organization without creating copies.

Use cases

Unify all your data across Amazon S3 data lakes and Amazon Redshift data warehouses for your analytics and AI initiatives with a single copy of data. Integrated access controls allows you to define fine-grained permissions and securely share a single copy of data across the entire organization.

Access near real-time data across operational databases and applications through zero-ETL integrations. Access and query your data in place, from a wide range of AWS services and open source and third-party tools and engines that support Apache Iceberg.

Bring existing data from multiple Amazon Redshift data warehouses to the lakehouse to query and join data stored in Amazon Redshift clusters and workgroups. Scale your workloads for extract, transform, and load (ETL) processes, BI reporting, and as-needed analysis without managing multiple data shares.

Customers

Lennar

"We have spent the last 18 months working with AWS to transform our data foundation to use best-in-class solutions that are cost effective as well. With advancements like Amazon SageMaker Unified Studio and Amazon SageMaker Lakehouse, we are accelerating our velocity of delivery through seamless access to data and services, thus enabling our engineers, analysts, and scientists to surface insights that provide material value to our business."

Lee Slezak, SVP of Data and Analytic, Lennar

Roche

Roche is a global pioneer in pharmaceuticals and diagnostics focused on advancing science to improve people's lives.

“We have been using Amazon Redshift to gain insights from both structured and semistructured data across all our data repositories. The new Amazon SageMaker Lakehouse excites me with its potential to enhance and unify access to data lakes or other data sources with services like Amazon Redshift, AWS Glue Data Catalog, and AWS Lake Formation. This innovation will allow our data and engineering teams to simplify data access, promoting interoperability across data, analytics, and application workloads. I foresee a notable reduction in data errors through less data copying, a 40% decrease in processing time, quicker analytics data write-back to transactional systems for improved decision-making, and empowering our teams to focus on creating business value.”

Yannick Misteli, Head of Engineering, Global Product Strategy, Roche

Idealista

Idealista supports real estate agents and private individuals across Southern Europe by providing an online real estate classifieds platform.

“Our goal is to streamline access to Salesforce data for enhanced analytics in our data lake. By leveraging the new Amazon SageMaker Lakehouse support for zero-ETL integrations from applications feature, we can simplify our data extraction and ingestion processes, removing the need for multiple ETLs to access Salesforce directly. This centralized approach reduces complexity and significantly improves our data management efficiency. We anticipate a significant time savings in data extraction and ingestion development, allowing our team to focus on deriving actionable insights from our data rather than managing its collection.“

Javier Monterrubio, Data Platform Engineer Manager, Idealista

The word 'idealista' displayed in a pixel-style, black font on a white background.

Carrier

"At Carrier, the next generation of Amazon SageMaker is transforming our enterprise data strategy by streamlining how we build and scale data products. SageMaker Unified Studio’s approach to data discovery, processing, and model development has significantly accelerated our lakehouse implementation. Most impressively, its seamless integration with our existing data catalog and built-in governance controls enables us to democratize data access while maintaining security standards, helping our teams rapidly deliver advanced analytics and AI solutions across the enterprise."

Partners

Tableau

Tableau helps people and organizations become more data-driven.

“The partnership between Amazon and Salesforce Tableau represents a shared commitment to innovation and customer success. Through Amazon’s new zero-ETL integration, we are combining Tableau’s AI-powered data and analytics with Amazon’s powerful data infrastructure to transform how organizations gain insight from their data. This seamless integration enables our customers to get insights from all of their structured and unstructured data using the power of Amazon SageMaker Lakehouse and Amazon Redshift, dramatically reducing engineering complexity and deployment time. Together, Tableau and Amazon are helping customers accelerate digital transformation and drive business value at scale.“

Ali Tore, Senior VP, Advanced Analytics, Tableau

dbt Labs

dbt Labs is on a mission to help analysts create and disseminate organizational knowledge.

"We've long been the transformation standard on top of Amazon Redshift, offering flexibility, collaboration, and trust. With the new Amazon SageMaker Lakehouse, we're excited to extend this value to more customers and even more data in the AWS environment. Now, customers can access all their data across the AWS system, including data warehouses and data lakes. We're excited to join our capabilities with the new Amazon SageMaker to deliver governance, cataloging, and data optimizations for our joint customers.”

Shawn Toldo, VP Partnerships, dbt Labs

Informatica

Informatica, a leader in enterprise AI–powered cloud data management, brings data and AI to life by empowering businesses to realize the transformative power of their most critical assets.

“Our Intelligent Data Management Cloud (IDMC) platform and Amazon SageMaker help organizations unlock data potential and drive innovation and efficiency. As an Amazon SageMaker Lakehouse launch partner, we're proud to deliver an enterprise-grade solution that meets the high standards of modern data-driven organizations. Together with AWS's infrastructure, we enable faster, informed decisions for impactful outcomes across industries.”

Pratik Parekh, SVP Product Management, Informatica

Next steps

Console

Get started with Amazon SageMaker today