What is a Lake House approach?

Seamlessly integrate your data lake, data warehouse, and purpose-built data stores

Lake House approach - how it all works

A Lake House approach acknowledges the idea that taking a one-size-fits-all approach to analytics eventually leads to compromises. It is not simply about integrating a data lake with a data warehouse, but rather about integrating a data lake, a data warehouse, and purpose-built stores, enabling unified governance and easy data movement. With a Lake House architecture on AWS, customers can store data in a data lake and use a ring of purpose-built data services around the lake allowing them to make decisions with speed and agility, at a scale and price/performance that is unmatched in the market.

A lake house architecture requires that customers:

  • Rapidly build scalable data lakes
  • Use a broad and deep collection of purpose-built data services
  • Ensure compliance via a unified way to secure, monitor, and manage access to your data
  • Scale your systems at a low cost without compromising performance
How it works - Lake house approach

Why you need a Lake House approach

Data volumes are increasing at an unprecedented rate, exploding from terabytes to petabytes and sometimes exabytes. Traditional on-premises data analytics approaches can’t handle these data volumes because they don’t scale well enough and are too expensive. Many companies are taking all their data from various silos and aggregating all that data in one location, what many call a data lake, to do analytics and ML directly on top of that data. At other times, these same companies are storing other data in purpose-built data stores to analyze and get fast insights from both structured and unstructured data. This data movement can be “inside-out”, “outside-in”, or “around the perimeter” because data has gravity.

  • Inside out
  • Outside in
  • Around the perimeter
  • Data gravity
  • Inside out
  • Inside-out data movement

    Customers storing data in a data lake and then moving a portion of that data to a purpose-built data store to do additional machine learning or analytics.

    Example: Clickstream data from web applications can be collected directly in a data lake, and a portion of that data can be moved out to a data warehouse for daily reporting. We think of this concept as inside-out data movement.

    Inside-out data movement
  • Outside in
  • Outside-in data movement

    Customers are storing data in purpose-built data stores such as a data warehouse or a database and are moving that data to a data lake to run analysis on that data. 

    Example: They copy query results for sales of products in a given region from their data warehouse into their data lake to run product recommendation algorithms against a larger dataset using ML.

    Outside-in data movement
  • Around the perimeter
  • Around the perimeter data movement

    Seamlessly integrate your data lake, data warehouse, and purpose-built data stores. 

    Example: They may copy the product catalog data stored in their database to their search service to make it easier to look through their product catalog and offload the search queries from the database.

    Outside-in data movement
  • Data gravity
  • Data gravity

    As data in these data lakes and purpose-built stores continues to grow, it becomes harder to move all this data around because data has gravity. It’s equally important to ensure that data can easily get to wherever it’s needed, with the right controls, to enable analysis and insights.

    Data gravity

Lake House approach pillars

Organizations are taking their data from various silos and aggregating all that data in one location to do analytics and machine learning on top of that data. To get the most value from it, they need to leverage a Lake House approach that allows them to move data between data lakes and purpose-built data stores easily. This modern way of architecting requires:

More customers are building Lake Houses on AWS than anywhere else

  • lake_house_customers_logo_bmw
  • lake_house_customers_logo_nielsen
  • lake_house_customers_logo_engie
  • BMW Group
  • BMW Group
    BMW Group

    To accelerate innovation and democratize data usage at scale, the BMW Group migrated their on-premises data lake to one powered by Amazon S3; BMW now processes TBs of telemetry data from millions of vehicles daily and resolves issues before they impact customers.

    Read the case study 
  • Nielsen
  • Nielsen
    Nielsen

    Nielsen, a global measurement and data analytics company, drastically increased the amount of data it could ingest, process, and report to its clients each day by taking advantage of a modern cloud technology. It went from measuring 40,000 households daily to more than 30 million.

    Read the case study 
  • Engie
  • Engie
    lake_house_customers_logo_engie

    ENGIE’s is one of the largest utility companies in France with 160,000 employees and 40 business units operating in 70 countries. Their Common Data Hub’s nearly 100 TB data lake uses AWS services to meet business needs in data science, marketing, and operations.

    Read the case study 

Partners

Learn how our Partners are helping organizations build a modern data architecture leveraging the Lake House approach on AWS.

Cloudera

Cloudera

Running Cloudera Enterprise on AWS provides IT and business users with a data management platform that can act as the foundation for modern data processing and analytics.

Learn more »

Informatica Cloud

Informatica Cloud

Informatica Cloud provides optimized integration to AWS data services with native connectivity to over 100 applications.

Learn more »

Dataguise

Dataguise

Dataguise is the leader in secure business execution, delivering data-centric security solutions that detect and protect an enterprise's sensitive data—no matter where it lives or who needs to leverage it.

Learn more »

Alluxio Data Orchestration

Alluxio Data Orchestration

Alluxio Data Orchestration enables customers to better leverage key AWS services, such as EMR and S3 for Analytic and AI workloads.

Learn more »

Getting started

AWS Data Driven Everything program

AWS Data-Driven Everything
In the AWS Data-Driven EVERYTHING (D2E) program, AWS will partner with our customers to move faster, with greater precision and a far more ambitious scope to jump-start your own data flywheel.

Learn more »

AWS data lab

AWS Data Lab
AWS Data Lab offers accelerated, joint engineering engagements between customers and AWS technical resources to create tangible deliverables that accelerate data and analytics modernization initiatives.

Learn more »

AWS analytics & big data reference architecture

AWS analytics & big data reference architecture
Learn architecture best practices for cloud data analysis, data warehousing, and data management on AWS.

Learn more »