Skip to main content

Zalando Reduces Cloud Costs for Tracking Platform by 4x and Latency by 40x Using Amazon Managed Service for Apache Flink

Learn how online fashion and lifestyle brand Zalando updated its streaming tracking solution using Amazon Managed Service for Apache Flink.

Overview

Online fashion company Zalando wanted to use an industry-standard solution to track near real-time (NRT) customer interactions, understand customer engagement, and improve the customer experience at optimized cloud costs. So, it migrated from its legacy tracking solution and used Amazon Web Services (AWS) to build a fast, efficient, scalable, and cost-effective solution.

Zalando-branded boxes in a stylish showroom with a mannequin in a blue suit, bags, and clothing racks in the background.
A blank white image.
We used current industry standards on AWS and achieved much more with significantly less operational burden and lower costs, which is critical in the fast-paced world of ecommerce.

Polina Belonozhka

Engineering Manager, Zalando

About Zalando

Founded in 2008, Zalando is a pan-European fashion and lifestyle ecommerce company that connects partners, brands, and customers in 25 countries. Its online stores have a wide range of products from over 6,000 brands and six Zalando-owned labels.

Opportunity | Using Amazon Managed Service for Apache Flink to Modernize Tracking for Zalando

Zalando, a fashion and lifestyle ecommerce company, sells over 6,000 international brands to 50 million customers in 25 European countries. Since 2018, Zalando had been using an in-house solution for capturing customer interactions clickstream on its website and mobile apps.

The solution ingested NRT data—called events—from frontend interactions and used backend configurations to correlate the events. Event data is enriched further before being distributed to Analytics and Personalisation functions for creating insights and enhancing the browsing experience. As the data volume increased, the backend correlation became slow, inefficient, and expensive. Although the company optimized the solution to reduce cloud expenses, further optimization required significant rearchitecting. Instead of reinvesting in the complex solution, Zalando decided to change its approach. “We wanted to use industry-standard products and do the event correlation on the client side to simplify our processing pipeline,” says Polina Belonozhka, engineering manager at Zalando.

Zalando considered adopting Apache Flink—an open-source streaming engine that can be used to build scalable, fault-tolerant, and low-latency applications—but the operational overhead of building and self-managing such a solution was a deterrent. In 2022, after discovering Amazon Managed Service for Apache Flink, which organizations use to build and run fully managed Apache Flink applications, the company decided to implement the service with support and direction from AWS. “We host all our mission-critical services on AWS,” says Sezer Akar, head of engineering at Zalando. “So, it was a natural choice to use AWS for this project.”  

In May 2022, the Zalando team participated in an AWS Workshop on Apache Flink, followed by a proof of concept in July. The team achieved significant improvements across three key performance indicators: cloud infrastructure costs, data processing latency, and data accuracy.

Zalando needed to verify that the new system could handle the transaction volume of several billion events per month and hundreds of thousands of events per second during peak sales events. So, the team established key metrics and a test regimen for load testing the new solution. The team worked alongside AWS from solution design to planning and implementation. AWS helped review technical design documents, built a large-scale demo to showcase that the future system could handle the load of peak sales events, and provided best practices. “We had deep dives with AWS solutions architects, streaming experts, and product managers to build and implement this solution,” says Samet Alemdar, senior software engineer at Zalando.

Solution | Implementing a Complex Streaming Tracking Solution with Minimal Operational Impact

Zalando first had to identify the schemas and events that were needed for the client-side solution and change everything at the source—a major cross-departmental process that took over 1 year.

Because traffic patterns change drastically over the day, the team used an automatic scaling mechanism that was developed by AWS. Additionally, the team reimplemented the Apache Flink pipeline and did several rounds of large-scale testing to verify that the system could handle peak loads. Before migrating to the new solution, Zalando implemented it in shadow mode in its production environment for several weeks to test scalability.

To collect, process, and analyze NRT data streams, the solution uses Amazon Kinesis Data Streams, which easily streams data at any scale. This made it possible for the new solution to consume the same data events as the legacy solution and operate in shadow mode initially. An API from Zalando’s website and mobile app ingests data, enriches it, and publishes it to a data stream. Amazon Managed Service for Apache Flink consumes this data, does further transformations and schema validation, and publishes the data to a data-sync service on Apache Kafka. Finally, the data is sent to Datalake for further enrichment, historical analytics, and long-term archival.

The streamlined architecture generates data that caters to multiple stakeholders without significant compromises. Users requiring rapid insights through low-latency events can access NRT data via Apache Kafka or as a direct sink from Amazon Managed Service for Apache Flink. Meanwhile, historical offline analysis is facilitated through archived content on Datalake. “Unlike the legacy solution, where all stakeholders had to endure slow and incomplete enrichment, the new solution provides a more flexible approach, delivering both speed and completeness simultaneously,” says Hoa Luong Ton, principal engineer at Zalando.

In addition, the solution publishes system health metrics to Amazon CloudWatch, a service that monitors applications, responds to performance changes, and optimizes resource use.

Managing the implementation alongside other projects, Zalando successfully deployed the solution in July 2024, completing the transformation with minimal operational impact. “We used current industry standards on AWS and achieved much more with significantly less operational burden and lower costs, which is critical in the fast-paced world of ecommerce,” says Belonozhka.

Outcome | Reducing Costs by 4 Times and Latency by 40 Times

Compared with the legacy solution, the modernized architecture helped Zalando reduce costs by 4 times and decrease latency by 40 times. The success rate of publishing data-sync events rose to 100 percent. “Things run fast for our customers and internal stakeholders,” says Ton. The faster response time helps to provide better recommendations to customers. It also simplifies and enhances the customer experience with features like search and personalization updating in NRT.

The solution is future ready because Zalando will benefit automatically from updates and improvements in the managed service. “It opened a new chapter for us and we will continue to utilize knowledge we gained from our AWS collaboration,” says Akar.

AWS Services Used

Amazon Managed Service for Apache Flink

Build and run fully managed Apache Flink applications

Learn more

Amazon Kinesis Data Streams

Easily stream data at any scale

Learn more

Amazon CloudWatch

Observe and monitor resources and applications on AWS, on premises, and on other clouds

Learn more

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.
Contact Sales

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages