Are you having challenges with your on-premises Apache Hadoop or Apache Spark deployments? Are you frustrated with over provisioning resources to handle workload variability? Do you spend too much time keeping up with rapidly changing open-source software innovation?

If so, you are not alone. Migrating your big data and machine learning to AWS and Amazon EMR offers many advantages over on-premises deployments. These include separation of compute and storage, increased agility, resilient and persistent storage, and managed services that provide up-to-date, familiar environments to develop and operate big data applications.

Jumpstart Your Migration to Amazon EMR

AWS is here to help you migrate your big data and applications. Our Apache Hadoop and Apache Spark to Amazon EMR Migration Acceleration Program provides two ways to help you get there quickly and with confidence.

Follow step-by-step instructions, get guidance on key design decisions, and learn best practices.

Migrating big data and analytics workloads from on-premises to the cloud involves careful decision making. AWS has helped many customers successfully migrate their big data from on-premises to Amazon EMR. Based on these successes, we put together a new detailed, step-by-step EMR Migration Guide. In the Guide, you will learn the best practices for:

  • Migrating data, applications, and catalogs
  • Using persistent and transient resources
  • Configuring security policies, access controls, and audit logs
  • Estimating and minimizing costs, while maximizing value
  • Leveraging the AWS Cloud for high availability (HA) and disaster recovery (DR)
  • Automating common administrative tasks

Create a migration plan for your organization in a free workshop given by EMR specialists at your location.

This new EMR Migration Workshop is a 2 day, customizable workshop that can jumpstart your migration to the cloud. The workshop provides a small and interactive setup where participants directly interact with AWS experts, discuss strategies, and map out a way forward. The workshop focuses on explaining the benefits of a cloud-native architecture, dives deep into Amazon EMR’s benefits and common usage patterns, architecting a data lake on AWS, migration methodologies, and security controls. The workshop also has guided hands-on labs that allow participants to try the most common architecture patterns in the cloud.

Q: How do I know if I qualify for the workshop?

You can get in touch with us and we can help you qualify. If you have an Apache Hadoop/Spark workload on-premises and want to migrate to the AWS cloud, you are good candidate.

Q: Who do I need from my team for the workshop to be effective?

We recommend that your Apache Hadoop/Spark Admins, Data Engineers, and Infrastructure Engineers be present. You can also invite users of such systems like Analysts, Data Scientists, or ML Engineers.

Provectus

Provectus delivers highly-efficient cloud-native data analytics solutions to accelerate enterprise transformation and enable AI, helping businesses gain a competitive edge.

Click here to learn more.

Provectus-200

Cloudwick

Cloudwick has 10 years of Global 1000 Hadoop operations expertise and has migrated more than 30 Hadoop clusters to AWS.

Click here to learn more.

Cloudwick

Customer Success

Intuit: Migrating Apache Spark and Hive (49:28)

Intuit talks about how they migrated analytics, data processing (ETL), and data science workloads, including key motivations, benefits, and details of key architectural changes and best practices.

Hadoop/Spark to Amazon EMR, Architect It for Security & Governance (55:46)

Airbnb and Guardian Life discuss why and how they migrated their Apache Hadoop and Apache Spark workloads to Amazon EMR and the benefits they experienced.

Build Data Engineering Platforms with Amazon EMR (55:21)

Salesforce.com and Vanguard discuss in detail how they use Amazon EMR to build a self-service, secure, and auditable data engineering platform.

FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud (1:01:51)

FINRA shares how they created a managed data lake that enables discovery on petabytes of data with Amazon EMR, while saving time and money over traditional analytics solutions.

Discover more Amazon EMR resources

Visit the resources page
Ready to build?
Get started with Amazon EMR
Have more questions?
Contact us