Modern Data Architecture on AWS
Modern data architecture — how it all works
A modern data architecture acknowledges the idea that taking a one-size-fits-all approach to analytics eventually leads to compromises. It is not simply about integrating a data lake with a data warehouse, but rather about integrating a data lake, a data warehouse, and purpose-built stores, enabling unified governance and easy data movement. With a modern data architecture on AWS, customers can rapidly build scalable data lakes, use a broad and deep collection of purpose-built data services, ensure compliance via a unified data access, security, and governance, scale their systems at a low cost without compromising performance, and easily share data across organizational boundaries, allowing them to make decisions with speed and agility at scale.

Why you need a modern data architecture
Data volumes are increasing at an unprecedented rate, exploding from terabytes to petabytes and sometimes exabytes. Traditional on-premises data analytics approaches can’t handle these data volumes because they don’t scale well enough and are too expensive. Many companies are taking all their data from various silos and aggregating all that data in one location, what many call a data lake, to do analytics and ML directly on top of that data. At other times, these same companies are storing other data in purpose-built data stores to analyze and get fast insights from both structured and unstructured data. This data movement can be “inside-out”, “outside-in”, “around the perimeter” or "sharing across" because data has gravity.
-
Inside out
-
Outside in
-
Around the perimeter
-
Sharing across
-
Data gravity
-
Inside out
-
Inside-out data movement
Customers storing data in a data lake and then moving a portion of that data to a purpose-built data store to do additional machine learning or analytics.
Example: Clickstream data from web applications can be collected directly in a data lake, and a portion of that data can be moved out to a data warehouse for daily reporting. We think of this concept as inside-out data movement.
-
Outside in
-
Outside-in data movement
Customers are storing data in purpose-built data stores such as a data warehouse or a database and are moving that data to a data lake to run analysis on that data.
Example: They copy query results for sales of products in a given region from their data warehouse into their data lake to run product recommendation algorithms against a larger dataset using ML.
-
Around the perimeter
-
Around the perimeter data movement
Seamlessly integrate your data lake, data warehouse, and purpose-built data stores.
Example: They may copy the product catalog data stored in their database to their search service to make it easier to look through their product catalog and offload the search queries from the database.
-
Sharing across
-
Sharing across data movement
Customers are using a modern data architecture to facilitate governance and data sharing across logical or physical governance boundaries to create Data Domains aligned to lines of business
-
Data gravity
-
Data gravity
As data in these data lakes and purpose-built stores continues to grow, it becomes harder to move all this data around because data has gravity. It’s equally important to ensure that data can easily get to wherever it’s needed, with the right controls, to enable analysis and insights.
Modern data architecture pillars
Organizations are taking their data from various silos and aggregating all that data in one location to do analytics and machine learning on top of that data. To get the most value from it, they need to leverage a modern data architecture that allows them to move data between data lakes and purpose-built data stores easily. This modern way of architecting requires:
-
Scalable data lakes
Tens of thousands of customers run their data lakes on AWS.
Setting up and managing data lakes today involves a lot of manual and time-consuming tasks. AWS Lake Formation automates these tasks so you can build and secure your data lake in days instead of months. For your data lake storage, Amazon S3 is the best place to build a data lake because it has unmatched 11 nines of durability and 99.99% availability; the best security, compliance, and audit capabilities with object-level audit logging and access control; the most flexibility with five storage tiers; and the lowest cost with pricing that starts at less than $1 per TB per month.
-
Purpose-built analytics services
AWS gives you the broadest and deepest portfolio of purpose-built analytics services optimized for your unique analytics use cases.
These services are all designed to be the best-in-class, which means you never have to compromise on performance, scale, or cost when using them. For example, Amazon Redshift is 3x faster and at least 50 percent less expensive than other cloud data warehouses. Spark on Amazon EMR runs 1.7x faster than standard Apache Spark 3.0, and you can run petabyte-scale analysis at less than half of the cost of traditional on-premises solutions.
-
Unified data access
As the data in your data lakes and purpose-built data stores continues to grow, you often need to be able to easily move a portion of that data from one data store to another.
AWS makes it easy for you to combine, move, and replicate data across multiple data stores and your data lake. For example, AWS Glue provides comprehensive data integration capabilities that make it easy to discover, prepare, and combine data for analytics, machine learning, and application development, while Amazon Redshift can easily query data in your S3 data lake. No other analytics provider makes it as easy for you to move your data, at scale, to where you need it the most.
-
Unified governance
One of the most important pieces of a modern analytics architecture is the ability for customers to authorize, manage, and audit access to data.
This can be challenging because managing security, access control, and audit trails across all of the data stores in your organization is complex, time- consuming, and error-prone. AWS gives you the governance capability to manage access to all your data across your data lake and purpose-built data stores from a single place. AWS Lake Formation allows you to centrally define and manage security, governance, and auditing policies, resulting in uniform access control for enterprise-wide data sharing.
-
Performant and cost-effective
AWS is committed to providing the best performance at the lowest cost across all analytics services, and we are continually innovating to improve the price performance of our services.
In addition to industry-leading price performance for analytics services, S3 intelligent tiering saves customers up to 70 percent on storage cost for data stored in your data lake, and Amazon EC2 provides access to an industry-leading choice of over 200 instance types, up to 100 Gbps network bandwidth, and the ability to choose between on-demand, reserved, and spot instances.
More customers are leveraging a modern data architecture on AWS than anywhere else
-
BMW Group
-
To accelerate innovation and democratize data usage at scale, the BMW Group migrated their on-premises data lake to one powered by Amazon S3; BMW now processes TBs of telemetry data from millions of vehicles daily and resolves issues before they impact customers.
-
Nielsen
-
Nielsen, a global measurement and data analytics company, drastically increased the amount of data it could ingest, process, and report to its clients each day by taking advantage of a modern cloud technology. It went from measuring 40,000 households daily to more than 30 million.
-
Engie
-
ENGIE’s is one of the largest utility companies in France with 160,000 employees and 40 business units operating in 70 countries. Their Common Data Hub’s nearly 100 TB data lake uses AWS services to meet business needs in data science, marketing, and operations.
Partners
Learn how our Partners are helping organizations build a modern data architecture on AWS.

Cloudera
Running Cloudera Enterprise on AWS provides IT and business users with a data management platform that can act as the foundation for modern data processing and analytics.
/Informatica_icon_solutionspace.b413aef928d0d5cb73d65ffe147b99059a187b46.png)
Informatica Cloud
Informatica Cloud provides optimized integration to AWS data services with native connectivity to over 100 applications.

Dataguise
Dataguise is the leader in secure business execution, delivering data-centric security solutions that detect and protect an enterprise's sensitive data—no matter where it lives or who needs to leverage it.

Alluxio Data Orchestration
Alluxio Data Orchestration enables customers to better leverage key AWS services, such as EMR and S3 for Analytic and AI workloads.
Getting started

AWS Data-Driven Everything
In the AWS Data-Driven EVERYTHING (D2E) program, AWS will partner with our customers to move faster, with greater precision and a far more ambitious scope to jump-start your own data flywheel.
Learn more »

AWS Data Lab
AWS Data Lab offers accelerated, joint engineering engagements between customers and AWS technical resources to create tangible deliverables that accelerate data and analytics modernization initiatives.

AWS analytics & big data reference architecture
Learn architecture best practices for cloud data analysis, data warehousing, and data management on AWS.