Migrating E-Commerce Sites to SaaS Using A/B Testing and AWS Edge Services

By Dr. Javier Navarro Machuca, CEO at IO Connect Services

Modern e-commerce websites face challenges in website versioning, system uptime, continuous deployments, and A/B testing for content and products.

Although some e-commerce solutions began as a revenue side channel, they had to evolve so they could continue to support more traffic, products, different checkout mechanisms, and multiple integrations with other applications and platforms.

This evolution usually leads to the cloud, entailing a partial or full migration to a more robust, agile, and scalable software-as-a-service (SaaS) platform. It’s common to migrate larger e-commerce sites in phases. However, migrating in phases means you wind up with a legacy site coexisting alongside a SaaS site in your production environment. This poses multiple challenges.

IO Connect Services is an AWS Advanced Consulting Partner with AWS Service Delivery designations for AWS Lambda and Amazon CloudFront, among others, meaning we have been validated by AWS Partner Solutions Architects as experts in delivering these AWS services for customers.

In this post, I will discuss the pros and cons of different approaches to maintaining two live versions of an e-commerce site in a production environment. I will draw on our recent experience with a customer migration.

About the Customer’s Legacy E-Commerce Website

As in many startups, our customer’s website was built with a friendly and straightforward open-source e-commerce platform, SpreeCommerce.

This platform is popular in startups and small companies, but the need for using a more sophisticated platform arises when the business grows and requires more complex integrations and product catalog management.

To begin the migration, our development team selected AWS to host the legacy solution shown in Figure 1.

Figure 1 – Legacy e-commerce website architecture.

The architecture and virtual private cloud (VPC) are straightforward: a web application hosted on web servers that consumes the Spree platform via APIs (this approach is commonly known as “headless”). The web servers and the Spree instances belong to Auto Scaling Groups for high availability, and an Application Load Balancer distributes the traffic between the web servers.

Migration Requirements

The migration requirements fell into two categories: logistical and business.

As a logistical requirement, the new e-commerce platform would be delivered by SaaS, and migration would happen in incremental phases because of the scope of work. However, the main objective was to maximize the return on investment (ROI) as soon as possible.

As the SaaS site begins to operate, the legacy website must continue to deliver any content that hasn’t been migrated yet to the SaaS site. Both versions need to coexist until the migration is complete.

These were the business requirements (see Figure 2):

Stakeholders must be able evaluate the performance of the migrated content with A/B testing.
A/B testing weight percentages must be adjusted on demand.
Users who have visited the new site must be able to access it for 30 days (this is the period a visitor takes, on average, to make a purchase).
The period of the version stickiness must be configurable and capable of being changed on demand.
Users must access the website via the same canonical domain (www), regardless of the site version.
Customer service representatives must be able to access both the legacy and the SaaS versions on demand because they need to see what the user is doing.

Figure 2 – Business requirements for A/B testing.

The development team brainstormed how to address both the business and logistics requirements while working on the migration. While doing so, they identified additional logistical requirements that were essential to facilitate the development and deployment of future phases without the need for re-work and code refactoring:

Traffic routing for the A/B testing must occur outside of the application logic, so that changing the weight values does not require an application deployment.
Deployments of new changes in the websites must not cause downtimes or affect response times.

Our development team considered three different approaches:

Sticky sessions
Allowlist/Denylist reverse proxies
Edge services

Sticky Sessions

A sticky session route requests for a particular web session to the same physical machine that handled the first request for that session, as shown in Figure 3.

This solution makes sense due to the “stickiness” requirement for users to stay on the new website in case they already performed any action in it. It’s a common load-balancing approach in large web farms.

Figure 3 – Approach of using a sticky session in the load balancer.

Sticky sessions address some of the business and logistical requirements, but they also have serious limitations:

Application Load Balancer only recently began supporting weighted routing.
.
A sticky session is optimized for short web sessions. For long-running sessions, the application must persist the session out of the process so when a new web session starts, a different web server can resume the session. This requires a drastic change to the application code so the different websites can share and understand the same session structure.
.
Increasing the time duration of the sticky session makes the solution more prone to error. If the web server handling the requests becomes unresponsive, the user experiences erroneous website behavior or receives error messages.

In November 2019, AWS introduced a new feature in its Application Load Balancer that supports A/B testing. Since then, developers have been able to configure weighted routing policies to distribute traffic to different target groups. The policy can specify a stickiness configuration via the TargetGroupStickinessConfig attribute with the DurationSeconds value.

Although this is an excellent feature that benefits A/B testing deployments and migrations, it presents one limitation and one unfulfilled requirement:

The limitation is the maximum limit of the DurationSeconds value, which is set to 604800 seconds (7 days). Therefore, the 30-day requirement for our e-commerce application is not met.
With this approach, customer service representatives don’t have a mechanism to access one site or the other on demand.

Allowlist/Denylist Reverse Proxies

One suggestion our development team reviewed in more detail was the use of two reverse proxies to manage the traffic routing based on a list of users who have accessed the new website. One proxy had an Allowlist configuration and one had a Denylist configuration.

A reverse proxy is a server that’s placed in front of web servers to handle the request from users to perform specific validations, like load balancing, caching, SSL offloading, or some security-level protection tasks. The idea is straightforward: to intercept the request and route the request to the correct website version based on whether the user has accessed the new version.

Figure 4 – Approach of using Allowlist and Denylist reverse proxies.

As shown in Figure 4, Amazon Route 53 maintains weight configurations for the A/B testing. At their default setting, only 20 percent of the users are allowed access the new SaaS website. Over time, we increased this number to distribute more traffic to the new site.

Access requests are intercepted by the Allowlist proxy server that registers each user. This server stores the user and notifies the Denylist proxy server about the user that will be allowed stay in this version for 30 days.

For now, the remaining 80 percent of traffic is routed to the Denylist proxy server. Here, the proxy validates whether the user is in the Denylist (visitor of the new version). If the user is, it forwards the requests to the Application Load Balancer of the new website. Otherwise, it passes the traffic to the legacy website.

Although this approach addresses all the requirements for A/B testing, it has a few considerable limitations:

The reverse proxy instances add operational overhead—more infrastructure and processes to maintain.
The solution required us to thoroughly monitor the notification/sync process. If this fails, the requirement is not met.
The reverse proxy must be placed in a cluster behind a load balancer for high availability purposes, and it requires having the Allowlist and Denylist shared among the cluster nodes.

Edge Services Solution

While we were looking for better ways to implement the SaaS solution, one of our developers learned about a mechanism that could intercept traffic at edge locations and perform validations on the web request with the Lambda@Edge feature of Amazon CloudFront. And it required zero server administration.

We decided to work on a rapid proof of concept (POC) to evaluate the feasibility of intercepting the request and forwarding it to a specific destination based on HTTP headers or cookie validations. In less than a day, we were able to validate the idea worked.

So, we focused on implementing the SaaS service with a serverless approach using these AWS Edge Services.

Amazon Route 53	Facilitates the A/B testing setup via weight routing configurations. Customer service representatives can access each website directly via DNS record configurations.
Amazon CloudFront	Edge Locations improve content delivery performance. Provides a privileged place to intercept web requests.
Lambda@Edge	Executes at the edge location. Ideal for implementing the routing logic. Programmer can use the web request context to validate conditions or to alter it.

Figure 5 – The AWS Edge Services we selected.

Our new approach required the new SaaS-based website to place an HTTP cookie on visitors. The web cookie enabled us to manage the duration of the long-term session. It was initially 30 days, but we were able to adjust it at any time if the business required it.

Figure 6 – Architecture of edge services solution.

As shown in Figure 6, the architecture of the selected solution, the A/B testing, and long-term stickiness are addressed outside of the web application boundaries. Hence, we could decommission the implementation once the full migration was complete with zero impact on users.

Another benefit of this approach is we could modify A/B testing values and routing logic at any time without application deployments. The solution is straightforward, yet flexible and powerful.

Figure 7 shows how the AWS Edge Services interact in the solution. The routing logic happens inside Lambda@Edge. It verifies that the HTTP cookie of the new website exists in the request and, if still valid, routes the user to the correct website version.

Figure 7 – Sequence diagram showing how the edge solution handles requests.

Figure 8 shows the code of the routing logic at Lambda@Edge. The implementation was very straightforward, and required no more than 20 lines of code.

Figure 8 – Routing logic implemented in Lambda@Edge.

Results

We put together the solution in a couple of days. It took us more time to brainstorm the approach than it did to build the Lambda@Edge POC and implement the code. Testing the implementation was very simple, as well, and the team released the changeset to the production environment successfully.

The product owners validated that A/B testing worked as expected, and that user groups accessed the right website. The customer service team was able to access the same content as the users, and access each version directly via subdomains. The business owners obtained the defined KPI metrics from the A/B testing, and the development team was able to update the content and weight routing percentages as needed.

Once the migration was complete, our development team set the weight value of the routing for the new SaaS website to 100 percent to retire the legacy site. Using this approach for the project migration, A/B testing, and constant code releases, the team achieved zero downtime deployments.

Conclusion

AWS continuously releases new features that make it possible to address complex requirements with simple and effective solutions. Keeping track of these new features can be challenging for developers and solution architects.

Amazon CloudFront takes an edge delivery beyond the traditional content delivery network services. With Lambda@Edge, you can implement powerful routines to handle routing conditions and validations based on network protocol and the payload of the request.

The Amazon Route 53 service also raises the capacity of any DNS provider several levels by providing routing policies and other mechanisms to achieve zero downtime deployments.

Migrating a full e-commerce solution is not a trivial project, and doing so while maintaining two different versions of the website concurrently in production makes it more challenging on many levels. Leveraging AWS edge services, you can increase business agility by not only improving the migration experience, but also by employing A/B testing to validate business decisions faster.

To find out more about our experience with AWS services, please contact IO Connect Services.

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.

.

.

IO Connect Services – AWS Partner Spotlight

IO Connect Services is an AWS Advanced Consulting Partner with AWS Service Delivery designations for AWS Lambda and Amazon CloudFront, among others, meaning they have been validated by AWS Partner Solutions Architects as an expert in delivering these AWS services for customers.

Contact IO Connect Services | Practice Overview

*Already worked with IO Connect Services? Rate the Partner

*To review an AWS Partner, you must be a customer that has worked with them directly on a project.