AWS Partner Network (APN) Blog
Tag: Spark
How SnapLogic eXtreme Helps Visualize Spark ETL Pipelines on Amazon EMR
Fully managed cloud services enable global enterprises to focus on strategic differentiators versus maintaining infrastructure. They do this by creating data lakes and performing big data processing in the cloud. SnapLogic eXtreme allows citizen integrators, those who can’t code, and data integrators to efficiently support and augment data-integration use cases by performing complex transformations on large volumes of data. Learn how to set up SnapLogic eXtreme and use Amazon EMR to do Amazon Redshift ETL.
Training Multiple Machine Learning Models Simultaneously Using Spark and Apache Arrow
Spark is a distributed computing framework that added new features like Pandas UDF by using PyArrow. You can leverage Spark for distributed and advanced machine learning model lifecycle capabilities to build massive-scale products with a bunch of models in production. Learn how Perion Network implemented a model lifecycle capability to distribute the training and testing stages with few lines of PySpark code. This capability improved the performance and accuracy of Perion’s ML models.
Lower TCO and Increase Query Performance by Running Hive on Spark in Amazon EMR
Learn how Mactores helped Seagate Technology to use Apache Hive on Apache Spark for queries larger than 10TB, combined with the use of transient Amazon EMR clusters leveraging Amazon EC2 Spot Instances. It was imperative for Seagate to have systems in place to ensure the cost of collecting, storing, and processing data did not exceed their ROI. Moving to Hive on Spark enabled Seagate to continue processing petabytes of data at scale with significantly lower TCO.
How to Use Amazon SageMaker to Improve Machine Learning Models for Data Analysis
Amazon SageMaker provides all the components needed for machine learning in a single toolset. This allows ML models to get to production faster with much less effort and at lower cost. Learn about the data modeling process used by BizCloud Experts and the results they achieved for Neiman Marcus. Amazon SageMaker was employed to help develop and train ML algorithms for recommendation, personalization, and forecasting models that Neiman Marcus uses for data analysis and customer insights.