How Launchmetrics improves fashion brands performance using Amazon EC2 Spot Instances
Launchmetrics offers its Brand Performance Cloud tools and intelligence to help fashion, luxury, and beauty retail executives optimize their global strategy. Launchmetrics initially operated their whole infrastructure on-premises; however, they wanted to scale their data ingestion while simultaneously providing improved and faster insights for their clients. These business needs led them to build their architecture in AWS cloud.
In this blog post, we explain how Launchmetrics’ uses Amazon Web Services (AWS) to crawl the web for online social and print media. Using the data gathered, Launchmetrics is able to provide prescriptive analytics and insights to their clients. As a result, clients can understand their brand’s momentum and interact with their audience, successfully launching their products.
Launchmetrics’ platform architecture is represented in Figure 1 and composed of three tiers:
- Data Persistence
The Crawl tier is composed of several Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances launched via Auto Scaling groups. Spot Instances take advantage of unused Amazon EC2 capacity at a discounted rate compared with On-Demand Instances, which are compute instances that are billed per-hour or -second with no long-term commitments. Launchmetrics heavily leverages Spot Instances. The Crawl tier is responsible for retrieving, processing, and storing data from several media sources (represented in Figure 1 with the number 1).
The Data Persistence tier consists of two components: Amazon Kinesis Data Streams and Amazon Simple Queue Service (Amazon SQS). Kinesis Data Streams stores data that the Crawl tier collects, while Amazon SQS stores the metadata of the whole process. In this context, metadata helps Launchmetrics gain insight into when the data is collected and if it has started processing. This is key information if a Spot Instance is interrupted, which we will dive deeper into later.
The third tier, Processing, also makes use of Spot Instances and is responsible for pulling data from the Data Persistence tier (represented in Figure 1 with the number 2). It then applies proprietary algorithms, both analytics and machine learning models, to create consumer insights. These insights are stored in a data layer (not depicted) that consists of an Amazon Aurora cluster and an Amazon OpenSearch Service cluster.
By having this separation of tiers, Launchmetrics is able to use a decoupled architecture, where each component can scale independently and is more reliable. Both the Crawl and the Data Processing tiers use Spot Instances for up to 90% of their capacity.
Data processing using EC2 Spot Instances
When Launchmetrics decided to migrate their workloads to the AWS cloud, Spot Instances were one of the main drivers. As Spot Instances offer large discounts without commitment, Launchmetrics was able to track more than 1200 brands, translating to 1+ billion end users. Daily, this represents tracking upwards of 500k influencer profiles, 8 million documents, and around 70 million social media comments.
Aside from the cost-savings with Spot Instances, Launchmetrics incurred collateral benefits in terms of architecture design: building stateless, decoupled, elastic, and fault-tolerant applications. In turn, their stack architecture became more loosely coupled, as well.
All Launchmetrics Auto Scaling groups have the following configuration:
- Spot allocation strategy: cost-optimized
- Capacity rebalance: true
- Three availability zones
- A diversified list of instance types
By using Auto Scaling groups, Launchmetrics is able to scale worker instances depending on how many items they have in the SQS queue, increasing the instance efficiency. Data processing workloads like the ones Launchmetrics’ platform have, are an exemplary use of multiple instance types, such as M5, M5a, C5, and C5a. When adopting Spot Instances, Launchmetrics considered other instance types to have access to spare capacity. As a result, Launchmetrics found out that workload’s performance improved, as they use instances with more resources at a lower cost.
By decoupling their data processing workload using SQS queues, processes are stopped when an interruption arrives. As the Auto Scaling group launches a replacement Spot Instance, clients are not impacted and data is not lost. All processes go through a data checkpoint, where a new Spot Instance resumes processing any pending data. Spot Instances have resulted in a reduction of up to 75% of related operational costs.
To increase confidence in their ability to deal with Spot interruptions and service disruptions, Launchmetrics is exploring using AWS Fault Injection Simulator to simulate faults on their architecture, like a Spot interruption. Learn more about how this service works on the AWS Fault Injection Simulator now supports Spot Interruptions launch page.
Reporting data insights
After processing data from different media sources, AWS aided Launchmetrics in producing higher quality data insights, faster: the previous on-premises architecture had a time range of 5-6 minutes to run, whereas the AWS-driven architecture takes less than 1 minute.
This is made possible by elasticity and availability compute capacity that Amazon EC2 provides compared with an on-premises static fleet. Furthermore, offloading some management and operational tasks to AWS by using AWS managed services, such as Amazon Aurora or Amazon OpenSearch Service, Launchmetrics can focus on their core business and improve proprietary solutions rather than use that time in undifferentiated activities.
Building continuous delivery pipelines
Let’s discuss how Launchmetrics makes changes to their software with so many components.
Both of their computing tiers, Crawl and Processing, consist of standalone EC2 instances launched via Auto Scaling groups and EC2 instances that are part of an Amazon Elastic Container Service (Amazon ECS) cluster. Currently, 70% of Launchmetrics workloads are still running with Auto Scaling groups, while 30% are containerized and run on Amazon ECS. This is important because for each of these workload groups, the deployment process is different.
For workloads that run on Auto Scaling groups, they use an AWS CodePipeline to orchestrate the whole process, which includes:
I. Creating a new Amazon Machine Image (AMI) using AWS CodeBuild
II. Deploying the newly built AMI using Terraform in CodeBuild
For containerized workloads that run on Amazon ECS, Launchmetrics also uses a CodePipeline to orchestrate the process by:
III. Creating a new container image, and storing it in Amazon Elastic Container Registry
IV. Changing the container image in the task definition, and updating the Amazon ECS service using CodeBuild
In this blog post, we explored how Launchmetrics is using EC2 Spot Instances to reduce costs while producing high-quality data insights for their clients. We also demonstrated how decoupling an architecture is important for handling interruptions and why following Spot Instance best practices can grant access to more spare capacity.
Using this architecture, Launchmetrics produced faster, data-driven insights for their clients and increased their capacity to innovate. They are continuing to containerize their applications and are projected to have 100% of their workloads running on Amazon ECS with Spot Instances by the end of 2023.
To learn more about handling EC2 Spot Instance interruptions, visit the AWS Best practices for handling EC2 Spot Instance interruptions blog post. Likewise, if you are interested in learning more about AWS Fault Injection Simulator and how it can benefit your architecture, read Increase your e-commerce website reliability using chaos engineering and AWS Fault Injection Simulator.