Modern Data Architecture on AWS

Seamlessly integrate your data lake, data warehouse, and purpose-built data stores

Modern data architecture — how it all works

A modern data architecture acknowledges the idea that taking a one-size-fits-all approach to analytics eventually leads to compromises. It is not simply about integrating a data lake with a data warehouse, but rather about integrating a data lake, a data warehouse, and purpose-built stores, enabling unified governance and easy data movement. With a modern data architecture on AWS, customers can rapidly build scalable data lakes, use a broad and deep collection of purpose-built  data services, ensure compliance via a unified  data access, security, and governance, scale their systems at a low cost without compromising performance, and easily share data across organizational boundaries, allowing them to make decisions with speed and agility at scale.

How it works - Modern Data Architecture

Why you need a modern data architecture

Data volumes are increasing at an unprecedented rate, exploding from terabytes to petabytes and sometimes exabytes. Traditional on-premises data analytics approaches can’t handle these data volumes because they don’t scale well enough and are too expensive. Many companies are taking all their data from various silos and aggregating all that data in one location, what many call a data lake, to do analytics and ML directly on top of that data. At other times, these same companies are storing other data in purpose-built data stores to analyze and get fast insights from both structured and unstructured data. This data movement can be “inside-out”, “outside-in”, “around the perimeter” or "sharing across" because data has gravity.

  • Inside out
  • Outside in
  • Around the perimeter
  • Sharing across
  • Data gravity
  • Inside out
  • Inside-out data movement

    Customers storing data in a data lake and then moving a portion of that data to a purpose-built data store to do additional machine learning or analytics.

    Example: Clickstream data from web applications can be collected directly in a data lake, and a portion of that data can be moved out to a data warehouse for daily reporting. We think of this concept as inside-out data movement.

    Inside-out data movement
  • Outside in
  • Outside-in data movement

    Customers are storing data in purpose-built data stores such as a data warehouse or a database and are moving that data to a data lake to run analysis on that data. 

    Example: They copy query results for sales of products in a given region from their data warehouse into their data lake to run product recommendation algorithms against a larger dataset using ML.

    Outside-in data movement
  • Around the perimeter
  • Around the perimeter data movement

    Seamlessly integrate your data lake, data warehouse, and purpose-built data stores. 

    Example: They may copy the product catalog data stored in their database to their search service to make it easier to look through their product catalog and offload the search queries from the database.

    Outside-in data movement
  • Sharing across
  • Sharing across data movement

    Customers are using a modern data architecture to facilitate governance and data sharing across logical or physical governance boundaries to create Data Domains aligned to lines of business

    Sharing across data movement
  • Data gravity
  • Data gravity

    As data in these data lakes and purpose-built stores continues to grow, it becomes harder to move all this data around because data has gravity. It’s equally important to ensure that data can easily get to wherever it’s needed, with the right controls, to enable analysis and insights.

    Data gravity

Modern data architecture pillars

Organizations are taking their data from various silos and aggregating all that data in one location to do analytics and machine learning on top of that data. To get the most value from it, they need to leverage a modern data architecture that allows them to move data between data lakes and purpose-built data stores easily. This modern way of architecting requires:

More customers are leveraging a modern data architecture on AWS than anywhere else

  • lake_house_customers_logo_bmw
  • lake_house_customers_logo_nielsen
  • lake_house_customers_logo_engie
  • BMW Group
  • BMW Group
    BMW Group

    To accelerate innovation and democratize data usage at scale, the BMW Group migrated their on-premises data lake to one powered by Amazon S3; BMW now processes TBs of telemetry data from millions of vehicles daily and resolves issues before they impact customers.

    Read the case study 
  • Nielsen
  • Nielsen
    Nielsen

    Nielsen, a global measurement and data analytics company, drastically increased the amount of data it could ingest, process, and report to its clients each day by taking advantage of a modern cloud technology. It went from measuring 40,000 households daily to more than 30 million.

    Read the case study 
  • Engie
  • Engie
    lake_house_customers_logo_engie

    ENGIE’s is one of the largest utility companies in France with 160,000 employees and 40 business units operating in 70 countries. Their Common Data Hub’s nearly 100 TB data lake uses AWS services to meet business needs in data science, marketing, and operations.

    Read the case study 

Partners

Learn how our Partners are helping organizations build a modern data architecture on AWS.

Cloudera

Cloudera

Running Cloudera Enterprise on AWS provides IT and business users with a data management platform that can act as the foundation for modern data processing and analytics.

Learn more »

Informatica Cloud

Informatica Cloud

Informatica Cloud provides optimized integration to AWS data services with native connectivity to over 100 applications.

Learn more »

Dataguise

Dataguise

Dataguise is the leader in secure business execution, delivering data-centric security solutions that detect and protect an enterprise's sensitive data—no matter where it lives or who needs to leverage it.

Learn more »

Alluxio Data Orchestration

Alluxio Data Orchestration

Alluxio Data Orchestration enables customers to better leverage key AWS services, such as EMR and S3 for Analytic and AI workloads.

Learn more »

Getting started

AWS Data Driven Everything program

AWS Data-Driven Everything
In the AWS Data-Driven EVERYTHING (D2E) program, AWS will partner with our customers to move faster, with greater precision and a far more ambitious scope to jump-start your own data flywheel.

Learn more »

AWS data lab

AWS Data Lab
AWS Data Lab offers accelerated, joint engineering engagements between customers and AWS technical resources to create tangible deliverables that accelerate data and analytics modernization initiatives.

Learn more »

AWS analytics & big data reference architecture

AWS analytics & big data reference architecture
Learn architecture best practices for cloud data analysis, data warehousing, and data management on AWS.

Learn more »