Data Lakes and Analytics on AWS

Fastest way to get answers from all your data to all your users
Easiest to build data lakes and analytics
Setting up and managing data lakes involves a lot of manual and time-consuming tasks such as loading, transforming, securing, and auditing access to data. AWS Lake Formation automates many of those manual steps and reduces the time required to build a successful data lake from months to days.
Scalable and cost effective
Data volumes are growing exponentially, and so are your costs to store and analyze that data. AWS provides comprehensive tooling to help control the cost of storing and analyzing all of your data at scale, including features like Intelligent Tiering for data storage in S3 and features that help reduce the cost of your compute usage, like auto-scaling, saving plans, and integration with EC2’s Spot instances.
Comprehensive and open
We provide the broadest and deepest portfolio of purpose-built analytics tools so you can quickly get insights from your data using the most appropriate tool for the job. All of our analytics services support open file formats like Apache Parquet so you don’t need to move and transform your data in order to analyze it, but can instead store it once in a standard format and analyze it using whatever tool or technique is most appropriate.
Secure infrastructure for analytics
Securing vast volumes of data is one of the biggest challenges facing most organizations. Beyond all of the certifications and best practices you would expect from AWS, we also have security features designed to help you stay compliant with your policies and industry regulations. For example, AWS Lake Formation provides multi-service, fine-grained access control to data, Amazon Macie helps find sensitive data that was accidentally stored in the wrong place, and Amazon Inspector helps spot configuration errors that might lead to data breaches.

AWS Analytics services

Category
Use cases
AWS service
Analytics
Interactive analytics

Amazon Athena

Query data in S3 using SQL.

Big data processing

Amazon EMR

Hosted Hadoop framework.

Data warehousing

Amazon Redshift

Fast, simple, cost-effective data warehousing.

Real-time analytics

Amazon Kinesis

Analyze real-time video and data streams.

Operational analytics

Amazon Elasticsearch Service

Run and scale Elasticsearch clusters.

Dashboards and visualizations

Amazon QuickSight

Fast business analytics service.

Visual data preparation

AWS Glue DataBrew

Clean and normalize data up to 80% faster.

Data movement
Real-time data movement

Amazon Managed Streaming for Apache Kafka (MSK)

Fully managed, highly available, and secure Apache Kafka service

Amazon Kinesis Video Streams

Capture, process, and store video streams for analytics and machine learning.

Amazon Kinesis Data Firehose

Prepare and load real-time data streams into data stores and analytics tools.

Amazon Kinesis Data Streams

Collect streaming data, at scale, for real-time analytics.

Amazon Kinesis Data Analytics

Get actionable insights from streaming data in real-time.

Data lake
Object storage

Amazon S3

Object storage built to store and retrieve any amount of data from anywhere.

AWS Lake Formation

Build a secure data lake in days.

Backup and archive

Amazon S3 Glacier

Low-cost archive storage in the cloud.

AWS Backup

Centralized backup across AWS services.

Data catalog

AWS Glue

Prepare and load data.

AWS Lake Formation

Build a secure data lake in days.

Third-party data

AWS Data Exchange

Find and subscribe to third-party data in the cloud.

Predictive analytics and machine learning
Frameworks and interfaces

AWS Deep Learning AMIs

Deep learning on Amazon EC2.

Platform services

Amazon SageMaker

Build, train, and deploy machine learning models at scale.

AWS Analytics services

Category Use cases AWS service
Analytics Interactive analytics Amazon Athena
Big data processing Amazon EMR
Data warehousing Amazon Redshift
Real-time analytics Amazon Kinesis Data Analytics
Operational analytics Amazon Elasticsearch Service
Dashboards and visualizations Amazon QuickSight
Visual data preparation Amazon Glue DataBrew
Data movement Real-time data movement Amazon Managed Streaming for Apache Kafka (Amazon MSK) | Amazon Kinesis Data Streams | Amazon Kinesis Data Firehose | Amazon Kinesis Data Analytics | Amazon Kinesis Video Streams | AWS Glue
Data lake Object storage Amazon S3 | AWS Lake Formation
Backup and archive Amazon S3 Glacier | AWS Backup
Data catalog
AWS Glue | AWS Lake Formation
Third-party data AWS Data Exchange
Predictive Analytics and Machine Learning Frameworks and interfaces AWS Deep Learning AMIs
Platform services Amazon SageMaker

Use cases

Page-Illo_Data-warehousing
Data warehousing

Run SQL and complex, analytic queries against structured and unstructured data in your data warehouse and data lake, without the need for unnecessary data movement.

Try Amazon Redshift »
Page-Illo_Big-data-processing
Big data processing

Quickly and easily process vast amounts of data in your data lake or on-premises for data engineering, data science development, and collaboration.

Try Amazon EMR »
Page-Illo_Real-time-analytics
Real time analytics

Collect, process, and analyze streaming data, and load data streams directly into your data lakes, data stores, and analytics services so you can respond in real time.

Try Amazon MSK » Try Amazon Kinesis »
Page-Illo_Data-visualization
Operational analytics

Search, explore, filter, aggregate, and visualize your data in near real time for application monitoring, log analytics, and clickstream analytics.

Try Amazon Elasticsearch Service »

Customers

JD-Power_Logo_@1x

"We built a 120TB data lake in Amazon S3, with 1500 different schemes and use AWS analytics services like Glue, Redshift, and Athena extensively. We couldn’t get these insights from a bunch of siloed databases and warehouses - we needed an S3 scale data lake."

- Bernardo Rodriguez
Chief Digital Officer, J.D. Power

netflix
Chick-fil-A_Logo
3M Company_Logo
280x100_Georgia-Pacific_Logo
Pinterest_Customer-Reference_Logo
TMobile_Logo_@1x
gt-customer_landing_page_graphics166x_epic
Adobe_Customer-Reference_Logo
Pfizer
View all customers »

Additional resources

AWS Data Lab

Create tangible deliverables that accelerate your data and analytics modernization initiatives. AWS Data Lab is a four-day intensive engagement between your team of builders and AWS technical resources.

Learn more »

Newsletter

Want to stay in the loop on educational content, upcoming events, and other innovations from AWS Analytics?

Subscribe to the AWS Analytics Newsletter »