AWS Marketplace
Category: Amazon EMR
Quant research at scale using AWS and Refinitiv data
In this blog post, Alex, Pramod, and I will show how to install and use the infrastructure we built to perform quant research at scale. We made the stack and examples available in the public repository so you can use it in your own investment research. This solution uses Apache Spark, Amazon EMR on EKS, Docker, Karpenter, EMR Studio Notebooks, and AWS Data Exchange for Amazon S3.
Accelerating Spark workloads on Amazon EMR with Windjammer’s Spark plugin
AWS customers often use Apache Spark for distributed big data processing. Spark has gained popularity due to its fast in-memory computing that enables parallel computation of tasks across multiple nodes. To aid customers with running Spark workloads, Amazon EMR provides a managed cluster platform that makes it easy to run frameworks such as Apache Hadoop, […]
Analyzing impact of regulatory reform on the stock market using AWS and Refinitiv data
In this blog post, Boris, Pramod, and Alex use Amazon Elastic MapReduce (Amazon EMR) to analyze historical market data provided by Refinitiv, part of LSEG, on AWS Data Exchange. We also show how AWS Data Exchange for Amazon S3 (Preview) and Amazon EMR enable us to eliminate undifferentiated heavy lifting and analyze terabytes of data cost-effectively by deploying Amazon EMR on Spot Instances.
Automate multi-account storage integrations in AWS using Snowflake and AWS Control Tower
This implementation guide describes how AWS Marketplace customers can integrate AWS Control Tower with Snowflake. The AWS Control Tower integration with Snowflake enables Snowflake storage integrations with Amazon S3 to be automatically available for all newly added AWS accounts in an AWS Control Tower environment. Snowflake is a data warehouse built for the cloud. It […]
Five-minute data connection and transformation from AWS to anywhere using Nexla
As data becomes ubiquitous and data environments grow more complex, my customers ask for scalable solutions for managing their data flows. Accessing data from different vendors and data sources can be complicated, and transformations and checkpoints require the expertise of data engineers. Custom in-house solutions present a cost in both development and management time. Businesses […]
Simplifying MLOps and improving model accuracy with Trifacta and Amazon SageMaker (Part 1)
This is the first blog post of a two-part series. Part 1 covers data preparation for machine learning (ML) using Trifacta, available in AWS Marketplace. Part 2, Simplifying machine learning operations with Trifacta and Amazon SageMaker, covers training the model using Amazon SageMaker Autopilot and operationalizing the workflow. In the past decade, advancements in machine […]
Accelerate Amazon EMR Spark, Presto, and Hive with the Alluxio AMI
Data analytics workloads are increasingly being migrated to the cloud. Amazon EMR is a cloud-native big data platform that makes it easy to process vast amounts of data quickly and cost effectively at scale. Amazon EMR, along with Amazon Simple Storage Service (Amazon S3) provides a flexible storage platform. With the click of a few […]
Using AWS Marketplace for machine learning workloads
Organizations of all sizes are realizing that Machine Learning is more than a nice-to-have capability. It’s becoming a necessary differentiator that has the potential to impact almost every aspect of the business. From back-office optimizations to business forecasting and risk reduction, ML is critical for companies looking to innovate and remain relevant. One way for […]