Category: Amazon EMR
In this blog post, Alex, Pramod, and I will show how to install and use the infrastructure we built to perform quant research at scale. We made the stack and examples available in the public repository so you can use it in your own investment research. This solution uses Apache Spark, Amazon EMR on EKS, Docker, Karpenter, EMR Studio Notebooks, and AWS Data Exchange for Amazon S3.
AWS customers often use Apache Spark for distributed big data processing. Spark has gained popularity due to its fast in-memory computing that enables parallel computation of tasks across multiple nodes. To aid customers with running Spark workloads, Amazon EMR provides a managed cluster platform that makes it easy to run frameworks such as Apache Hadoop, […]
In this blog post, Boris, Pramod, and Alex use Amazon Elastic MapReduce (Amazon EMR) to analyze historical market data provided by Refinitiv, part of LSEG, on AWS Data Exchange. We also show how AWS Data Exchange for Amazon S3 (Preview) and Amazon EMR enable us to eliminate undifferentiated heavy lifting and analyze terabytes of data cost-effectively by deploying Amazon EMR on Spot Instances.
This implementation guide describes how AWS Marketplace customers can integrate AWS Control Tower with Snowflake. The AWS Control Tower integration with Snowflake enables Snowflake storage integrations with Amazon S3 to be automatically available for all newly added AWS accounts in an AWS Control Tower environment. Snowflake is a data warehouse built for the cloud. It […]
As data becomes ubiquitous and data environments grow more complex, my customers ask for scalable solutions for managing their data flows. Accessing data from different vendors and data sources can be complicated, and transformations and checkpoints require the expertise of data engineers. Custom in-house solutions present a cost in both development and management time. Businesses […]
This is the first blog post of a two-part series. Part 1 covers data preparation for machine learning (ML) using Trifacta, available in AWS Marketplace. Part 2, Simplifying machine learning operations with Trifacta and Amazon SageMaker, covers training the model using Amazon SageMaker Autopilot and operationalizing the workflow. In the past decade, advancements in machine […]
Data analytics workloads are increasingly being migrated to the cloud. Amazon EMR is a cloud-native big data platform that makes it easy to process vast amounts of data quickly and cost effectively at scale. Amazon EMR, along with Amazon Simple Storage Service (Amazon S3) provides a flexible storage platform. With the click of a few […]
Organizations of all sizes are realizing that Machine Learning is more than a nice-to-have capability. It’s becoming a necessary differentiator that has the potential to impact almost every aspect of the business. From back-office optimizations to business forecasting and risk reduction, ML is critical for companies looking to innovate and remain relevant. One way for […]