Get started with Big Data on AWS

Easily run interactive queries directly against data in Amazon S3. Pay only for the queries you run.

 

Learn more »

Easily deploy popular open source, big data frameworks like Apache Hadoop, Spark, Presto, HBase, and Flink.

Learn more »

Fast, fully managed, petabyte-scale data warehouse makes it easy to run even complex queries on massive collections of structured data.

Learn more »

Easily query data in Amazon S3 using Standard SQL
Learn More | Get Started

Diagram_Big-Data_Athena-S3

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing your data immediately.

You can use Athena to process logs, perform ad-hoc analysis, and run interactive queries; and you just pay for the queries you run.

NewsCorp-Logo

"Athena has proven to be fast, easy to use, and cost effective."
Watch the video »

 


Build massively scalable applications for data transformation, real-time, and predictive analytics.
Learn More | Get Started

big-data_diagram_2

Amazon EMR is a managed service that lets you process and analyze extremely large data sets using the latest versions of big data processing frameworks such as Apache Hadoop, Spark, HBase, and Presto on fully customizable clusters.

Amazon EMR goes far beyond SQL. You can run custom applications and code for applications such as machine learning, graph analytics, data transformation, streaming data, and more. You can define specific compute, memory, storage, and application parameters to optimize your analytic requirements.

R-Divider_Redfin_Logo

Redfin provides real estate listing & recommendations to millions of homebuyers. Every day, Redfin uses Amazon EMR with spot instances – dynamically spinning up & down Apache Hadoop clusters – to perform large data transformations and deliver data to internal and external customers. Watch the video »

 


Analyze all your data using your existing business intelligence tools. Run complex business reports with data from multiple sources.
Learn More | Get Started

Diagram_Big-Data_Redshift

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. The query engine in Redshift has been optimized to run SQL queries with really fast performance, including complex queries that join large numbers of database tables.

You can use Amazon Redshift when you need to pull together data from many different sources – like inventory systems, financial systems, retail sales systems, and even log data – into a common format, and store it for long periods of time, to build sophisticated reports with very high query performance.

 

R-Divider_Nasdaq_Logo

"Nasdaq achieved faster, richer analytics and data warehousing capabilities while reducing costs by 57% by shifting to Amazon Redshift". Watch the session »

 


If you need Consider using
Run ad hoc queries on data stored in S3 Athena
Interactively analyze data in S3 before loading it into Redshift Athena
Run custom code on Spark, Hive, Pig, Presto clusters
EMR
Build and train a predictive model using Spark EMR
Custom application for real-time recommendations EMR
Enterprise reports that join data from mutiple structured data sources Redshift
Run complex queries that join large numbers of database tables on an ongoing basis Redshift
Support business intelligence workloads Redshift

Query Data on Amazon S3 Using SQL

Follow this tutorial to instantly query data stored in Amazon S3, using Amazon Athena.
 

Analyze Big Data with Hadoop

Create a Hadoop cluster and run a Hive script to process log data with Amazon EMR in this 60-minutes project.

Deploy a Data Warehouse 

Follow this simple project to deploy a fully managed data warehouse in just 60 minutes, with Amazon Redshift.