AWS Training and Certification Blog
10 study areas for the AWS Certified Data Analytics – Specialty exam
April 9, 2024 Update: The AWS Certified Data Analytics – Specialty is retired. Learn more in this blog.
As a solutions architect at AWS, I have spent the past few years providing technical guidance to many AWS customers as they designed and built cloud-based data architectures. Prior to AWS, I held various positions in the data space, ranging from data engineering to machine learning. I considered it my area of depth. But as I continued to work with a greater variety of AWS customers, I also saw an increasing variety of data patterns, sources, tools, and requirements. I pursued the AWS Certified Data Analytics – Specialty certification in order to deepen my knowledge across all analytics domains.
In this blog, I will share how I prepared for the AWS Certified Data Analytics – Specialty exam. Earning the credential will validate your expertise in designing data solutions and using analytics services to derive insights from data. This credential also helps organizations identify and develop talent with critical skills for implementing cloud initiatives.
I have found that I am better prepared to help customers with design considerations when building data architectures on AWS after achieving the certification. If you have experience working with AWS services to build data solutions, keep reading to learn what to expect during the exam and how to prepare.
Areas of study
The AWS Certified Data Analytics – Specialty exam is one of 12 AWS Certifications offered, and one of six at the Specialty-level. The exam includes questions that test your understanding of how to use AWS services to design, build, secure, and maintain analytics solutions. You will need to understand how the services integrate with one another as part of the broader data lifecycle of collection, storage, processing, and visualization.
From my experience preparing for and earning this certification, I recommend that candidates for this exam focus on these 10 areas of study. Each study area includes a (non-exhaustive) list of resources that illustrate the concept and what you can expect on the exam.
1. Architecture patterns and design principles
The exam goes beyond service recall and requires you to analyze patterns and select the most appropriate solution. Start by orienting yourself with high-level design recommendations, common architectural patterns, and the logic behind them. Many of the analytics patterns revolve around the Modern Data Architecture framework. The Modern Data Architecture advocates for a centralized data lake, surrounded by purpose-built data services with seamless data movement between them. The Data Analytics Lens of the AWS Well-Architected Framework provides key characteristics and considerations for the most common scenarios, including the modern data architecture, data mesh, batch data processing, streaming ingestion and processing, operational analytics, and data visualization.
Additionally, this re:Invent presentation from my colleague Ben Snively provides a great refresher on frequently encountered architectural patterns and best practices.
Consider the following resources:
- Whitepaper: Architectural Patterns to Build End-to-End Data Driven Applications on AWS
- Whitepaper: Derive Insights from AWS Modern Data Architecture
- Whitepaper: Big Data Analytics Options on AWS
- Whitepaper: Build Modern Data Streaming Architectures on AWS
2. Concepts and AWS services for the five domains of analytics
The exam classifies questions into five domains: 1) Collection, 2) Storage and Data Management, 3) Processing, 4) Analysis and visualization, and 5) Security. Often, analytics professionals specialize in one of these domains more than in others. Now is the time to dive deep into analytics concepts and AWS Analytics services you may not know well enough.
For example, the ‘collection’ domain includes questions on Amazon Kinesis (Data Streams, Firehose, and Data Analytics), Amazon Managed (and self-hosted) Kafka, Amazon DynamoDB Streams, Amazon Simple Queue Services, Amazon Database Migration Service, AWS Snowball, and AWS Direct Connect. You should understand the characteristics and use cases for these services and how they differ from one another. You should also understand key data architecture design concepts, such as data order, format, and compression.
To learn about a service, or increase your depth of knowledge, read the service FAQs, the developer or management guide, and consider a hands-on lab or class from AWS Training. Most guides have tutorials, and AWS also offers self-paced labs and immersion days.
Consider the following resources:
- Training: AWS Exam Readiness – Data Analytics Specialty
- Service Guides: AWS Documentation
- Service FAQs: AWS FAQs
- Sample Questions: Data Analytics Specialty Sample Questions
- Workshops: AWS Workshops
The rest of the study areas below include key themes on the exam that you should understand across all AWS Analytics and related services, with links to information that illustrate the concept.
3. Data movement integrations between services
A modern data architecture requires seamless data movement between data producers, processing applications, data lakes, and purpose-built data stores. When choosing a data movement or processing step, it is critical to validate that it will support the required data source and destination(s) at the required cadence. Expect many real-time, near-real-time, event-driven, and scheduled distinctions. Beyond knowing which integrations exist, the exam will expect you to know how they work and key considerations when using them.
Consider the following (non-exhaustive) list of resources:
- Documentation: Loading Data into Amazon RedShift
- Documentation: Supported Destinations for Kinesis Data Firehose
- Documentation: Sources and Sinks for Kinesis Data Analytics
- Documentation: Which data sources can I crawl in AWS Glue?
- Blog: Crafting serverless streaming jobs with AWS Glue
- Documentation: Sources for data migration with Amazon DMS
4. Data access integration between services
AWS advocates for a data architecture that leverages purpose-built data stores, democratizes data access, and uses the right tool for the job. A data platform that implements these principles will also need to enable data access from these various data stores, and for a variety of downstream users. Most tools support Amazon S3 (typically used as the data lake), and many services offer capabilities like federated queries to support “around the perimeter” data access between services. The exam will ask questions about these integrations and how to implement them.
Consider the following resources:
- Workshop: Modern Data Architecture Immersion Day Workshop
- Documentation: Configure cluster location and data storage in Amazon EMR
- Documentation: Connect to data sources in Amazon Athena
- Blog: Best practices for Amazon RedShift Spectrum
- Documentation: Connecting to data in Amazon QuickSight
5. Common analytical query scenarios
Ultimately, organizations invest in data infrastructure in order to derive actionable insights from their data. The exam will ask analysis questions, streaming analytics, log analytics, data visualizations, and machine learning. Note that many AWS Analytics services offer built-in machine learning capabilities, and you should know them.
Consider the following resources:
- Documentation: Visual types in Amazon QuickSight
- Workshop: Amazon Quicksight
- Blog post: Introducing Kinesis Data Analytics Studio
- Workshop: Transform Data with AWS Glue DataBrew
- Documentation: Streaming SQL Concepts
- Documentation: SQL reference for Amazon Athena
- AWS re:Invent video: Democratizing data for self-service analytics and ML
6. Managing, scaling, and updating applications
The volume and velocity of data that organizations are storing, processing, and querying is increasing at an exponential rate. Over time, many organizations that start with terabytes of data need to scale to handle petabytes or even exabytes of data. Cloud-native analytics approaches offer elasticity to respond to changing scale requirements and mechanisms to decrease the management overhead and cost. The exam will expect you to understand how to implement them. AWS has also added (a growing list of) many new serverless options in the analytics space, and you should know which services offer serverless options and how to use them.
Consider the following resources:
- Documentation: Resizing Clusters in Amazon RedShift
- Documentation: Sizing Amazon OpenSearch Service domains
- Documentation: Data Streams Quotas and Limits in Amazon Kinesis
- Documentation: Creating and Managing Streams in Amazon Kinesis
- Documentation: Developing Custom Consumers with Dedicated Throughput in Amazon Kinesis
- Documentation: Manage clusters in Amazon EMR
- Blog: Amazon EMR Serverless Now Generally Available
- Blog: Amazon Redshift Serverless – Now Generally Available with New Capabilities
7. Data partitioning and distribution strategies
Distributing chunks of data to enable parallel processing is a key scaling concept for almost all data services. Amazon Kinesis has shards and partitions, Amazon OpenSearch has indices and shards, big data processing tools like Apache Spark have partitions, Amazon RedShift has distribution keys, Amazon QuickSight has SPICE (Super-fast, Parallel, In-memory Calculation Engine), and so on. For all services, you should be very familiar with the partitioning strategies, recommended sizes, and how to optimize them for performance.
Consider the following resources:
- Documentation: Partitioning data in Amazon Athena
- Documentation: Amazon RedShift Distribution styles
- Documentation: Amazon RedShift Distribution examples
- Blog: Demystifying OpenSearch shard allocation
- Blog: Work with partitioned data in AWS Glue
- Blog: Under the hood: Scaling your Kinesis data streams
8. Security and compliance
Cloud security is the highest priority at AWS. When talking about analytics workloads, security includes classifying sensitive data, protecting data at-rest and in-transit, controlling data access, controlling infrastructure access, and auditing. Classic AWS security concepts and services are important here, such as encryption, Amazon VPC, AWS IAM policies, and AWS CloudTrail. There are also analytics-specific data governance tools like AWS Lake Formation, Athena Workgroups, and Amazon QuickSight users.
Consider the following resources:
- Well Architected Framework: Security (Analytics Lens)
- Whitepaper: AWS Glue building a secure data pipeline
- Blog: Easily manage your data lake at scale using AWS Lake Formation Tag-based access control
- Blog: Design patterns for an enterprise data lake using AWS Lake Formation cross-account access
- Documentation: Managing Database Security in Amazon RedShift
- Documentation: Security in Amazon EMR
- Workshop: Athena Workgroups
- Documentation: Using AWS Lake Formation with Amazon QuickSight
9. Monitoring and troubleshooting Analytics workloads
Monitoring is an important part of maintaining the reliability, availability, and performance of AWS Analytics services. Amazon CloudWatch monitors many key metrics for analytics services. You should know which metrics are most important for each service, as well as common problems and how to fix them. Some services have additional monitoring dashboards (particularly Spark-based workloads, like those in AWS Glue).
Consider the following resources:
- Documentation: Monitoring AWS Glue Spark Jobs
- Blog: Which metrics should I use to monitor and troubleshoot Kinesis Data Streams issues?
- Blog: Why is my Kinesis data stream throttling?
- Documentation: Recommended CloudWatch alarms for Amazon OpenSearch Service
- Documentation: Factors affecting RedShift query performance
10. Amazon S3
Amazon S3 acts as the foundation for all data platforms built on AWS. Amazon S3 is a flexible, durable, highly available, low cost, and almost infinitely scalable data store. It is a service that is prominently featured in data architectures, and in the exam. As a data architect, you need to understand lifecycle policies, integrations, optimum storage patterns, security, access patterns, and cross-regional data transfer. For example, Amazon Athena cannot read data stored in the S3 Glacier Storage class.
Consider the following resources:
- Whitepaper: Storage Best Practices for Data and Analytics Applications
- Documentation: Amazon S3 Storage Classes
- Documentation: Troubleshooting in Athena
- Documentation: Creating an Amazon QuickSight dataset using Amazon S3 files
- Blog: S3 Select and Glacier Select – Retrieving Subsets of Objects
- Documentation: Security Best Practices for Amazon S3
- Knowledge Center: How can I increase Amazon S3 request limits to avoid throttling on my Amazon S3 bucket?
Get hands-on
Finally, there is no substitute for getting hands-on with AWS services to strengthen your understanding. As part of my preparation for the exam, I built several streaming and batch data ingestion architectures within my AWS account. If you haven’t done it yet, sign up for a training account and take advantage of on-demand digital courses on AWS Skill Builder, virtual/in-person instructor-led classroom training, virtual webinars, and an exam-readiness course. The AWS Certified Data Analytics – Specialty exam page can also help you build a plan to prepare.
The value of AWS Certification
Organizations in every industry want to accelerate decision making in today’s complex and disrupted business landscape. There is a need for technology professionals who understand how to leverage AWS’ elastic data processing services to support these business outcomes. The AWS Certified Data Analytics – Specialty certification presents IT or engineering professionals with the opportunity to validate their knowledge and show that they understand how to design cost-efficient, secure, and high-performance data processing architectures on AWS. Preparing for a certification exam is an excellent way to reinforce your knowledge of any technology. I hope you consider pursuing this exam and experience similar benefits. Best of luck!