AWS Training and Certification Blog

Learn to build batch analytics solutions with new AWS Classroom course

Did you know worldwide spending on big data and business analytics is projected to reach $684.12  billion by 2030, growing at a compound annual growth rate of 13.5 percent? Individuals with big data skills are in demand and our new Amazon Web Services (AWS) intermediate-level course, Building Batch Data Analytics Solutions on AWS, can help you keep pace with this growth and expand your big data cloud skills.

If you are a data engineer or data architect who builds data analytics pipelines with open-source analytics frameworks, such as Apache Hadoop or Apache Spark, this one-day, virtual classroom course will help you develop these skills. You’ll learn to build a modern data architecture using Amazon EMR, an enterprise-grade Apache Spark, and Apache Hadoop managed service.

What is different about big data processing today?

Big data technologies benefit organizations when existing databases and applications can no longer cost-effectively scale to support unpredictable increases in volume, variety, and velocity of data. Because time to insight is a competitive differentiator, organizations can use big data tools to analyze large datasets—both in batch and in real-time—to uncover valuable insights.

Amazon EMR is a managed cluster platform that simplifies the provisioning, deployment, running, and managing of open-source analytics frameworks, such as Apache Hadoop and Apache Spark. Amazon EMR also integrates with Simple Storage Service (Amazon S3), AWS Glue data catalog, and AWS Lake Formation to discover, catalog, and secure data in an Amazon S3 data lake. To take advantage of managed machine learning services, you can integrate machine learning workloads running on Apache Spark with Amazon SageMaker.

As data continues to grow exponentially, cost-effective and performant operation of a data analytics pipeline becomes even more important. Amazon EMR supports cost-effective operation by separating the scaling of storage from compute, while Amazon EMR runtime for Apache Spark improves performance by 2x compared to clusters without the EMR runtime.

Developing the skills needed to take advantage of these capabilities is critical to organizations migrating from on-premises, open-source analytics frameworks to Amazon EMR and also for customers building cloud-native, big data solutions using Amazon EMR.

About the course

Building Batch Data Analytics Solutions on AWS will show you how to build a batch data analytics pipeline using Amazon EMR in a hands-on environment with the help of expert AWS instructors. You’ll learn three major skills: 1) How to both ingest transactional and streaming data using AWS services and process that data using Apache Spark on Amazon EMR; 2) How to leverage notebooks to process and analyze data; and 3) How to integrate Amazon EMR with AWS Glue and leverage fine-grained access control using AWS Lake Formation.

The course starts with data ingestion and storage, progresses to transformation and analysis, and finishes with security and monitoring of Amazon EMR clusters. You’ll learn about Amazon EMR cluster components and approaches to optimize cost, availability, and performance. AWS instructors will use lab and interactive sessions to demonstrate connecting to a Spark cluster, running tasks and orchestrating workflow using AWS Step Functions, and reviewing directed acyclic graphs and Spark metrics in Spark history server. You’ll also create an EMR notebook and use PySpark to interact with an EMR cluster. Finally, you’ll take part in an instructor-facilitated exercise to build a data analytics solution to solve a business problem.

Whether you attend the class virtually or in-person, you’ll have the opportunity to ask questions, work through solutions with your peers, and get real-time feedback from accredited AWS instructors with deep technical knowledge.

What are the prerequisites for this course?

To get the most out of this course, we recommend that learners have at least a year of experience managing open-source analytics frameworks, such as Apache Hadoop or Apache Spark, and foundational knowledge of AWS. You can satisfy the AWS foundational knowledge requirement by completing the AWS Technical Essentials course or the Architecting on AWS course, followed by Building Data Lakes on AWS.

Is the AWS Certified Data Analytics – Specialty your goal?

If you want to earn an industry-recognized credential from AWS that validates your expertise in AWS Analytics services, you may want to consider the AWS Certified Data Analytics – Specialty certification. While the Building Batch Data Analytics Solutions on AWS course explores the ingestion, storage, and processing stages of a data analytics pipeline, we offer additional resources to help you prepare for the  exam, including an exam guide, sample questions, an official practice question set, and more.

Feb 23, 2024 Update: The AWS Certified Data Analytics – Specialty will retire on April 9, 2024 and that is also the last day to access exam prep resources on Skill Builder. Learn more in this blog.

What resources are available if I want to learn more?

If you’re interested in learning more about our AWS Training and Certification offerings for data analytics, download our AWS Data Analytics Ramp-Up Guide. We offer many free, on-demand digital resources, as well as several virtual instructor-led courses for data analytics. Learn more about Building Batch Data Analytics Solutions on AWS and register today.