
Zappos Creates Breakthrough Customer Experiences Using AWS
2020
Using AWS services as building blocks allows engineers to focus on improving performance and results rather than DevOps overhead."
Ameen Kazerouni
Head of Machine Learning Research and Platforms, Zappos

Searching for the Perfect Fit
Zappos knows that providing accurate recommendations is key to an efficient shopping experience. The company reassures customers with its generous return policy and fast and free shipping, but these offerings are both expensive and undifferentiated.
“We are always asking ourselves: how do we differentiate further?” says Kazerouni. “How do we optimize return rates without negatively affecting the customer experience? These are the problems we set out to solve using machine learning and analytics on AWS.”
In the search phase of the customer journey, the company’s goal was to make personalized recommendations at runtime to increase search relevance. Rather than using a generic search algorithm, Zappos seeks to understand customers personally and provide a unique set of search results for a given term. (It also prominently displays an opt-out button for customers who do not want this level of personalization.)
At the same time, it can’t afford to slow down search performance noticeably. “We needed to minimize the amount of time the extra operations take,” notes Kazerouni. “So we combine high-performance caching, strategic precalculation of certain results, and ensemble-based machine learning approaches that use multiple, simple models.”
More Than the Sum of Its Parts
The data pipeline starts with a lightweight client sending relevant events to an ingestion API for processing. The API sits in an auto-scaling group to handle high volumes of data. From the API, the data is sent to Amazon Data Firehose for ingestion into an Amazon Redshift data warehouse that provides high-performance data access for machine learning research. Amazon Simple Storage Service (Amazon S3) is the intermediary between Amazon Data Firehose and Amazon Redshift.
Zappos uses several technologies for training and running models. It relies on Amazon SageMaker to predict customer apparel sizes. These predictions are cached and then exposed at runtime via microservices APIs for use in recommendations. Zappos uses Amazon EMR to run big data analytics for a fraction of the cost of traditional on-premises clusters. It also runs models using graphical processing units (GPUs) on Amazon Elastic Compute Cloud (Amazon EC2).
The company enables ultrafast lookup of precomputed predictions using two distinct services. Amazon DynamoDB stores precomputed results that will be accessed at runtime. This fully managed key-value and document database delivers single-digit millisecond performance at almost any scale. It can handle more than 10 trillion requests a day and can support peaks of more than 20 million requests per second. For even faster response times, Zappos takes advantage of Amazon ElastiCache for Redis, an in-memory data store, as a cache layer. This service ensures sub-millisecond latency when needed.
The microservices that run models and consolidate results run on Amazon EC2 instances arranged in auto-scaling groups with location-based load balancers. Zappos uses Amazon Route 53 as the domain name system, routing traffic throughout the solution.
Run, Don’t Walk
Creating and maintaining this intricate architecture with traditional development and deployment methods would be prohibitively complex. Instead, Zappos relies on infrastructure as code using AWS CloudFormation. “Every aspect of the solution is represented in AWS CloudFormation templates,” reports Kazerouni. “To make a change, we just tweak the template. If we need to fix the way the services communicate with Redis, we don’t repeat the change manually—we change the template and deploy it everywhere.”
He notes that it would be impossible to build the solution without the wealth of AWS services at the team’s command. “Using AWS services as building blocks allows engineers to focus on improving performance and results rather than DevOps overhead.”
Customers Feel the Love
Zappos delivered these improved search results to customers with a nearly undetectable increase in latency, with 99 percent of searches completed in less than 48 milliseconds. By using a similar architecture, it has also significantly improved personalized sizing recommendations based on simple fit surveys and past purchases. As a result, the company has reduced repeated searches and product returns. It has also achieved higher search-to-product-clickthrough rates and raised the position of customer selections in search results.
As Kazerouni sums it up, “We think of ourselves as a customer service company that happens to sell shoes and apparel. Anything we can do that improves service improves our business. Using AWS makes it possible for us to innovate the experience faster.”
To learn more, visit aws.amazon.com/big-data/datalakes-and-analytics.
About Zappos
Zappos began 20 years ago as a small, online shoe retailer. Since then, it has grown to sell clothing, handbags, accessories, and more while providing renowned customer service and innovative employee experiences. The company has been a subsidiary of Amazon since 2009.
Benefits of AWS
Keeps search latency below 48 milliseconds for 99% of searches
Personalizes searches for better customer experience
Achieves higher search-to-clickthrough rates
Gets fewer returns due to improved sizing recommendations