Amazon EMR | AWS Machine Learning Blog

Data processing options for AI/ML

This blog post was reviewed and updated June, 2022 to include new features that have been added to the Data processing such as Amazon SageMaker Studio and EMR integration. Training an accurate machine learning (ML) model requires many different steps, but none are potentially more important than data processing. Examples of processing steps include converting […]

Accessing data sources from Amazon SageMaker R kernels

Amazon SageMaker notebooks now support R out-of-the-box, without needing you to manually install R kernels on the instances. Also, the notebooks come pre-installed with the reticulate library, which offers an R interface for the Amazon SageMaker Python SDK and enables you to invoke Python modules from within an R script. You can easily run machine […]

Exploring data warehouse tables with machine learning and Amazon SageMaker notebooks

Are you a data scientist with data warehouse tables that you’d like to explore in your machine learning (ML) environment? If so, read on. In this post, I show you how to perform exploratory analysis on large datasets stored in your data warehouse and cataloged in your AWS Glue Data Catalog from your Amazon SageMaker […]

Build Amazon SageMaker notebooks backed by Spark in Amazon EMR

This blog post was last reviewed August, 2022. Introduced at AWS re:Invent in 2017, Amazon SageMaker provides a fully managed service for data science and machine learning workflows. One of the important parts of Amazon SageMaker is the powerful Jupyter notebook interface, which can be used to build models. You can enhance the Amazon SageMaker […]

Distributed Inference Using Apache MXNet and Apache Spark on Amazon EMR

In this blog post we demonstrate how to run distributed offline inference on large datasets using Apache MXNet (incubating) and Apache Spark on Amazon EMR. We explain how offline inference is useful, why it is challenging, and how you can leverage MXNet and Spark on Amazon EMR to overcome these challenges. Distributed inference on large […]

Run Deep Learning Frameworks with GPU Instance Types on Amazon EMR

Today, AWS is excited to announce support for Apache MXNet and new generation GPU instance types on Amazon EMR, which enables you to run distributed deep neural networks alongside your machine learning workflows and big data processing. Additionally, you can install and run custom deep learning libraries on your EMR clusters with GPU hardware. Through […]

Build PMML-based Applications and Generate Predictions in AWS

If you generate machine learning (ML) models, you know that the key challenge is exporting and importing them into other frameworks to separate model generation and prediction. Many applications use PMML (Predictive Model Markup Language) to move ML models from one framework to another. PMML is an XML representation of a data mining model. In […]

Category: Amazon EMR