Taking the first step
Purpose |
Help determine which AWS analytics services are the best fit for your organization. |
Last updated |
February 20, 2025 |
Covered services |
|
Introduction
Data is foundational to modern business. People and applications need to securely access and analyze data, which comes from new and diverse sources. The volume of data is also constantly increasing, which can cause organizations to struggle with capturing, storing, and analyzing all the necessary data.
Meeting these challenges means building a modern data architecture that breaks down all of your data silos for analytics and insights--including third-party data--and makes it accessible to everyone in the organization, in one place, with end-to-end governance. It's also increasingly important to connect your analytics and machine learning (ML) systems to enable predictive analytics.
This decision guide helps you ask the right questions to build your modern data architecture on AWS services. It explains how to break down your data silos (by connecting your data lake and data warehouses), your system silos (by connecting ML and analytics), and your people silos (by putting data in the hands of everyone in your organization).
This eight-minute exerpt is from a one-hour presentation by Sirish Chandrasekaran and Rick Sears at re:Invent 2024. It provides an overview of how a fictional company, Maxdome, uses SageMaker Unified Studio AI and analytics to unlock data insights.
Understand AWS analytics services
A modern data strategy is built with a set of technology building blocks that help you manage, access, analyze, and act on data. It also gives you multiple options to connect to data sources. A modern data strategy should empower your teams to:
-
Use your preferred tools or techniques
-
Use artificial intelligence (AI) to assist with finding answers to specific questions about your data
-
Manage who has access to data with the proper security and data governance controls
-
Break down data silos to give you the best of both data lakes and purpose-built data stores
-
Store any amount of data, at low-cost, and in open, standards-based data formats
-
Connect your data lakes, data warehouses, operational databases, applications, and federated data sources into a coherent whole
AWS offers a variety of services to help you achieve a modern data strategy. The following diagram depicts the AWS services for analytics that this guide covers. The tabs that follow provide additional details.

The next generation of Amazon SageMaker
Consider criteria for AWS analytics services
There are many reasons for building data analytics on AWS. You might need to support a greenfield or pilot project as a first step in your cloud migration journey. Alternatively, you might be migrating an existing workload with as little disruption as possible. Whatever your goal, the following considerations can be useful in making your choice.
Analyze available data sources and data types to gain a comprehensive understanding of data diversity, frequency, and quality. Understand any potential challenges in processing and analyzing the data. This analysis is crucial because:
-
Data sources are diverse and come from various systems, applications, devices, and external platforms.
-
Data sources have unique structure, format, and frequency of data updates. Analyzing these sources helps in identifying suitable data collection methods and technologies.
-
Analyzing data types, such as structured, semi-structured, and unstructured data determines the appropriate data processing and storage approaches.
-
Analyzing data sources and types facilitates data quality assessment, helps you anticipate potential data quality issues—missing values, inconsistencies, or inaccuracies.
Choose AWS analytics services
Now that you know the criteria to evaluate your analytics needs, you are ready to choose which AWS analytics services are right for your organizational needs. The following table aligns sets of services with common capabilities and business goals.
Categories | What is it optimized for? | Services |
---|---|---|
Unified analytics and AI |
Analytics and AI development Optimized for using a single environment to access data, analytics, and AI capabilities. |
Amazon SageMaker Unified Studio (preview) |
Data processing |
Interactive analytics Optimized for performing real-time data analysis and exploration, which allows users to interactively query and visualize data. |
|
Big data processing Optimized for processing, moving, and transforming large amounts of data. |
||
Data catalog Optimized for providing detailed information about the available data, its structure, characteristics, and relationships. |
||
Data streaming |
Apache Kafka processing of streaming data Optimized for using Apache Kafka data-plane operations and running open source versions of Apache Kafka. |
Amazon MSK |
Real-time processing Optimized for rapid and continuous data intake and aggregation, including IT infrastructure log data, application logs, social media, market data feeds, and web clickstream data. |
Amazon Kinesis Data Streams | |
Real-time streaming data delivery Optimized for delivering real-time streaming data to destinations such as Amazon S3, Amazon Redshift, OpenSearch Service, Splunk, Apache Iceberg Tables, and any custom HTTP endpoint or HTTP endpoints owned by supported third-party service providers. |
Amazon Data Firehose | |
Building Apache Flink applications Optimized for using Java, Scala, Python, or SQL to process and analyze streaming data. |
Amazon Managed Service for Apache Flink | |
Business intelligence |
Dashboards and visualizations Optimized for visually representing complex datasets, and providing natural language query of your data. |
|
Search analytics |
Managed OpenSearch clusters Optimized for log analytics, real-time application monitoring, and clickstream analysis. |
|
Data governance |
Managing data access Optimized for setting up the proper management, availability, usability, integrity, and security of data throughout its lifecycle. |
|
Data collaboration |
Secure data clean rooms Optimized for collaborating with other companies without sharing raw underlying data. |
|
Data lake and warehouse |
Integrated data lake and data warehouse access Optimized for unifying your data across Amazon S3 data lakes and Amazon Redshift data warehouses. |
|
Object storage for data lakes Optimized for providing a data lake foundation with virtually unlimited scalability and high durability. |
||
Data warehousing Optimized for centrally storing, organizing, and retrieving large volumes of structured and sometimes semi-structured data from various sources within an organization. |
Use AWS analytics services
You should now have a clear understanding of your business objectives, and the volume and velocity of data you will be ingesting and analyzing to begin building your data pipelines.
To explore how to use and learn more about each of the available services—we have provided a pathway to explore how each of the services work. The following sections provides links to in-depth documentation, hands-on tutorials, and resources to get you started from basic usage to more advanced deep dives.
-
Getting started with Amazon Athena
Learn how to use Amazon Athena to query data and create a table based on sample data stored in Amazon S3, query the table, and check the results of the query.
-
Get started with Apache Spark on Athena
Use the simplified notebook experience in Athena console to develop Apache Spark applications using Python or Athena notebook APIs.
-
Catalog and govern Athena federated queries with SageMaker Lakehouse
Learn how to connect to, govern, and run federated queries on data stored in Amazon Redshift, DynamoDB (Preview), and Snowflake (Preview).
-
Analyzing data in Amazon S3 using Athena
Explore how to use Athena on logs from Elastic Load Balancers, generated as text files in a pre-defined format. We show you how to create a table, partition the data in a format used by Athena, convert it to Parquet, and compare query performance.
Explore ways to use AWS analytics services
Reference architecture diagrams
Explore architecture diagrams to help you develop, scale, and test your analytics solutions on AWS.