Deciding that you want to double your capacity and simply being able to do it - that never gets old, and we’re using it as a competitive advantage. Leveraging the managed services on AWS have allowed us to be agile. I feel like they’re the strongest tool in our tool belt. The ease with which we’re able to be agile is huge for us in our efforts to deliver a consistent and unique user experience.
Seventy-two-hour flash sales. Constantly shifting inventory, where products are added and removed at least three times a day. Enormous savings on everything from coffee grinders to couture. Let’s just say that Rue La La isn’t your typical online shopping site. And that’s the point. The company provides its members with a dynamic online shopping experience that’s different each time they visit the site.
To continually differentiate the company from competitors and bring more value to members, the Rue La La team turned to personalization. “Our goal was addressing our core customers. We wanted them to know that we understand them as a customer and we can provide them with suggestions that we believe they’re really going to like,” says Stephen Harrison, data science architect at Rue La La.
In early 2017, the company held a full-day meeting to answer one question: How can we provide a more personalized experience for Rue La La members? “At the end of that meeting, we had wireframe diagrams drawn on a board of our vision for our future recommendation engine, MyRue,” says Ben Wilson, data science architect at Rue La La. “We didn’t simply want to build a recommendation engine that would give users a broad list of suggested products. We wanted to leverage machine learning techniques and create a recommendation engine to provide customized feeds for each active user, which organizes brands and types of products to user affinity predictions, as well as categories of items in brands that we think that they’re going to enjoy. We chose to build this engine in a consumable fashion that’s not too divergent from how our members typically interact with the site.”
To build Rue’s Collaborative Filtering (CF) recommendation model based on calculations from users who have similar affinities, the data science team had to overcome challenges triggered by what differentiates Rue La La in the first place: its ever-changing inventory and flash-sale structure. The team prioritized developing a unique machine-learning algorithm that is well adapted to the company’s flash business model: dynamically displaying new deals in product groups having strong user affinities.
“If we were a static inventory site with very low percent inventory changeover every day, this project would have taken a couple of weeks,” says Wilson. “But we couldn’t approach our model in a traditional way because on any given day, sometimes 20 to 30 percent of our items are brand new, and there’s no data on them.” For example, the Rue La La team could only run predictions of affinity to the products scheduled to sell on a given day and could not use SKU-based recommendations. “Furthermore, we have many new members who come to the site at different times of the day, causing a massive cold-start problem impeding us from using traditional collaborative filtering models,” says Wilson.
Example suggestions from the MyRue recommendation engine
After careful evaluation of its distinctive data requirements and model specifications, Rue La La leveraged the alternating least squares (ALS)-powered CF implementation within the Apache Spark MLlib package to build and run its recommendation engine algorithm. The team needed to identify a service on which their engine could run massive data sets, complex in-memory processing, highly flexible scalability, and was easy for Rue’s developers to use.
The team at Databricks understands how to run effective, fast analytics engines using Apache Spark based on the founder’s background: They created the original Apache Spark project. Databrick’s cloud-native, unified analytics platform helps customers harness the power of Apache Spark to derive value from their data. “Our goal is to create a collaborative space for data analysts, data scientists, and data engineers to work within a common platform and framework to gain insights from data tailored to their individual needs,” says Databricks Alliances Lead Justin Fenton.
“When I started at Rue La La, I evaluated a number of options for where we were going to run code. We met with many companies and did a comprehensive analysis of the options. Candidly, it was a quick decision to work with Databricks,” says Wilson. While the Rue La La team was impressed with Databricks’ technological capabilities—especially with Apache Spark—getting to know the Databricks team was key.
“While meeting with the Databricks team, I’m continuously blown away by their professionalism, their friendliness, their intelligence, and their deep interest in helping us build inventive solutions,” says Wilson. “And we get enterprise-level support that is best-in-class.” With an existing relationship in place, the team at Rue La La worked with Databricks, which has AWS Big Data and AWS Machine Learning Competency designations, to support the recommendation engine.
The team chose to run its recommendation engine on Databricks for many reasons, including its cost effectiveness, flexibility, speed, and interoperability with AWS services. “Databricks offers us all of the functionality we need in a fully managed, auto-scaling service that allows us to execute the job quickly in a single code base on a resilient service,” says Wilson.
“Rue La La’s use case centers around a high volume of data, a lot of different users with different shopping behaviors, and a short-term engagement with the application being used,” says Caryl Yuhas, Databricks solutions architect. “We needed to help them build an engine on Databricks that could handle their volume, speed, and analysis requirements. Apache Spark and the Spark ML library were a natural fit.”
The team at Rue La La looked for a solution to drive personalization tailored to individual users using the Databricks recommendation engine.
Diagram 1: How does Rue La La get personalization from Databricks to Rue members?
The answer rests in native, managed AWS services.
Diagram 2: Rue La La uses managed AWS services to drive personalization from Databricks to Rue La La Members without having to compromise: It’s fast, scalable, repeatable, low latency, and highly available.
Rue La La uses AWS services such as AWS Batch, Amazon DynamoDB, AWS Lambda, and Amazon API Gateway in their architecture. “One piece we didn’t know how to architect at first was the last mile: How do we make large volumes of data available via an API to our storefront and mobile applications so they can render views and pages that are personalized for our members?” says Harrison. “If we could leverage fully managed native AWS services to avoid bringing up any servers of our own for whatever aspect of the data science APIs, then we were going to do it. And where we ended up is pretty interesting.”
Rue La La’s data usage pattern is 100 percent read-dominated more than 23 hours a day, and write-dominated for short windows of about 20 minutes. It is triggered by Databricks Apache Spark jobs writing data to Amazon Simple Storage Service (Amazon S3).
Diagram 3: Importing Data to AWS from Databricks
The company stores more than 100 million items of frequently changing data in DynamoDB in a sparse matrix containing more than 20 billion elements. “We wanted to make it fast to load the data and to deliver the data. We load the API data every day because that’s the cadence in which products and people are coming and going,” says Harrison. Rue La La uses AWS Batch and a small-footprint custom importer the team wrote to load data into DynamoDB at scale.
Diagram 4: Using AWS Batch to Import Data to Amazon DynamoDB
Rue La La calls DynamoDB directly from API Gateway to support its data science APIs. “We get between 26 and 28 milliseconds API response time for our most complex member recommendation data day or night, even at peak times,” says Harrison. “We have capacity that’s prone to bursts because our flash sales typically start at 11 a.m., 3 p.m., and 8 p.m. We’ll see site traffic increase significantly around 11 a.m., and it’s still 26 to28 milliseconds every single time. It’s very consistent and well within our latency budget for APIs.”
“The kinds of things we’re doing and at the scale we’re doing them, unless you want to have a multi-million-dollar investment in owning hardware, then you only have a couple of viable options. And we feel AWS is far ahead of the competition. I look at competitors from time-to-time to see where they’re at in their capabilities, and it’s measured in years behind AWS in my opinion,” says Harrison.
For the Rue La La team, the ease with which they’ve been able to run their recommendation engine has been crucial. “Our architecture just works,” says Harrison. “We haven’t had a single failure in the eight months the service has been up.” The team has been particularly impressed by the sheer amount of data that AWS Batch can process and the scalability of DynamoDB.
“I don’t know how to talk about DynamoDB’s scalability limits because I don’t know where the boundaries are,” says Harrison. “It works beautifully for us. And another big surprise for me has been what we’ve been able to achieve using AWS Batch. It’s central to everything we do and exactly what we needed to do our periodic imports of data. Without AWS Batch, I’m not exactly sure how we would’ve done it at the time.”
Another big advantage for Rue La La is the cost savings achieved with Databricks and AWS. “We built this complicated engine with two people,” says Wilson. “And we can run it in a cost-efficient and agile fashion and rapidly iterate using Databricks and AWS. Having the flexibility to spin up dynamic resources whenever we want, and have it just work, allows us to rapidly test and prototype items in a way that you just can’t do on any other service. Databricks and AWS are complete game changers for us.”
“Deciding that you want to double your capacity and simply being able to do it--that never gets old,” says Harrison. “And we’re using it as a competitive advantage. Leveraging the managed services on AWS have allowed us to be agile. I feel like they’re the strongest tool in our tool belt. The ease with which we’re able to be agile is huge for us in our efforts to deliver a consistent and unique user experience.”
Databricks was founded by the team that created Apache Spark, Databricks provides a unified analytics platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks is an Advanced APN Technology Partner and holds the AWS Big Data and AWS Machine Learning Competencies.
Learn more about Machine Learning at AWS.