Up and Running with Big Data: 3 Day Deep-Dive

Over three days, explore the Big Data tools, technologies and techniques which allow organizations to gain insight and drive new business opportunities by finding signal in their data. Using Amazon Web Services, you'll learn how to use the flexible map/reduce programming model to scale your analytics, use Hadoop with Elastic MapReduce, write queries with Hive, develop real world data flows with Pig and understand the operational needs of a production data platform.

Schedules and Locations

Sign Up Now

What You Will Learn

Brought to you by AWS & Think Big Analytics, this course offers a hands-on training experience with short lectures and plenty of programming exercises. The agenda includes the following topics:

The agenda includes the following topics:

  • Amazon Elastic MapReduce Overview and Hadoop Architecture.
  • Amazon Elastic MapReduce Value Proposition.
  • Starting and examining your first EMR cluster.
    • Transient versus persistent clusters
    • Dynamic cluster resizing
  • Writing your first Amazon Elastic MapReduce job.
  • Loading Data into the cluster.
  • Amazon Elastic MapReduce Controls and Debugging EMR.
  • CloudWatch integration
  • S3 backup and disaster recovery
  • Spot Integration
  • Bootstrap Actions for cluster customization and configuration
  • DynamoDB integration
  • Data and Security.
  • Elastic MapReduce Programming Models.
  • Amazon Elastic MapReduce with streaming.
  • Amazon Elastic MapReduce with Pig.
  • Amazon Elastic MapReduce with Hive.
  • Advanced Hadoop Features – UDFs, UDAFs.
  • Amazon Elastic MapReduce Ecosystem.

Prerequisites

The following prerequisites ensure that you will gain the maximum benefit from the course.

  • Programming experience: This is a developer course. We will write Java, Hive, and Pig applications. Prior Java experience is strongly recommended.
  • Linux shell experience: Basic Linux shell (bash) commands will be used extensively. Some prior experience is recommended.
  • Experience with SQL databases: SQL experience is helpful for learning Hive and Pig, but not essential.

What You Must Bring

We will log into remote EMR instances to build, test, and run our applications. You will also be provided with all the exercise software so you can view it on your laptop, if desired.

Bring your laptop with the following software installed in advance.

  • JDK 1.6 or 1.7: The JDK (Java Development Kit) version 1.6 or newer (not just the JRE - Java Runtime Environment).
  • Ant: The Java-based ant build tool, version 1.7 or newer, if you want to build and test the Java exercises on your laptop.
  • A programmer’s source code editor: Whatever you prefer. Either Eclipse or IntelliJ IDEA is recommended for the Java exercises and project files for both environments will be provided. You might find a separate programmer’s text editor to be more convenient for Hive and Pig exercise.

*Please Note* this course is not taught by AWS Employees.

Additional Elastic MapReduce Resources

Interested in learning more? Click on the below links to gain insight on What EMR is and how it can help you!

  • Introduction (3:10)

    The agenda for the class, and why developers should consider using Amazon Elastic MapReduce.
  • Getting Started (11:04)

    Signing up for an AWS account, generating a key-pair, and setting up an S3 bucket.
  • Running Jobs (14:47)

    Creating, monitoring, and getting results from you EMR Job Flow.
  • Clusters of Servers (10:50)

    EC2 instance types, pricing, and Hadoop cluster configuration.
  • Dealing with Data (18:54)

    S3 architectures, pricing, and access control.
  • Map-Reduce Lab (12:49)

    How to use a Hadoop Job Flow to analyze text from Wikipedia.
  • Command Line Tools (9:04)

    When and how to use the EMR and s3cmd tools.
  • Debugging Tips (19:34)

    Best practices for debugging EMR Job Flows.
  • Hive & Pig (21:41)

    Creating, monitoring, and getting results from Hive & Pig Job Flows.
  • Hive Lab (3:28)

    How to use a Hive Job Flow to analyze Wikipedia article data.
  • Advanced Elastic MapReduce (14:12)

    Bootstrap actions, spot pricing and task groups.
©2011, Amazon Web Services LLC or its affiliates. All rights reserved.