AWS Partner Network (APN) Blog

Tag: Spark

SnapLogic-AWS-Partners

How SnapLogic eXtreme Helps Visualize Spark ETL Pipelines on Amazon EMR

Fully managed cloud services enable global enterprises to focus on strategic differentiators versus maintaining infrastructure. They do this by creating data lakes and performing big data processing in the cloud. SnapLogic eXtreme allows citizen integrators, those who can’t code, and data integrators to efficiently support and augment data-integration use cases by performing complex transformations on large volumes of data. Learn how to set up SnapLogic eXtreme and use Amazon EMR to do Amazon Redshift ETL.

Read More

Training Multiple Machine Learning Models Simultaneously Using Spark and Apache Arrow

Spark is a distributed computing framework that added new features like Pandas UDF by using PyArrow. You can leverage Spark for distributed and advanced machine learning model lifecycle capabilities to build massive-scale products with a bunch of models in production. Learn how Perion Network implemented a model lifecycle capability to distribute the training and testing stages with few lines of PySpark code. This capability improved the performance and accuracy of Perion’s ML models.

Read More
Mactores-AWS-Partners

Lower TCO and Increase Query Performance by Running Hive on Spark in Amazon EMR

Learn how Mactores helped Seagate Technology to use Apache Hive on Apache Spark for queries larger than 10TB, combined with the use of transient Amazon EMR clusters leveraging Amazon EC2 Spot Instances. It was imperative for Seagate to have systems in place to ensure the cost of collecting, storing, and processing data did not exceed their ROI. Moving to Hive on Spark enabled Seagate to continue processing petabytes of data at scale with significantly lower TCO.

Read More
BMC-AWS-Partners

How to Orchestrate a Data Pipeline on AWS with Control-M from BMC Software

In spite of the rich set of machine learning tools AWS provides, coordinating and monitoring workflows across an ML pipeline remains a complex task. Control-M by BMC Software that simplifies complex application, data, and file transfer workflows, whether on-premises, on the AWS Cloud, or across a hybrid cloud model. Walk through the architecture of a predictive maintenance system we developed to simplify the complex orchestration steps in a machine learning pipeline used to reduce downtime and costs for a trucking company.

Read More
Machine Learning-4

How to Use Amazon SageMaker to Improve Machine Learning Models for Data Analysis

Amazon SageMaker provides all the components needed for machine learning in a single toolset. This allows ML models to get to production faster with much less effort and at lower cost. Learn about the data modeling process used by BizCloud Experts and the results they achieved for Neiman Marcus. Amazon SageMaker was employed to help develop and train ML algorithms for recommendation, personalization, and forecasting models that Neiman Marcus uses for data analysis and customer insights.

Read More