AWS Partner Network (APN) Blog

From Low-Code iPaaS to Serverless Architecture: Stance’s Transformation with AWS and Trek10

By David Avram, Solutions Architect – Trek10
By Drew Shepherd, Cloud Architect – Trek10
By Michael Barney, Cloud Architect – Trek10
By William Lorenz, Solutions Architect – AWS

Trek10-ASW-Partners-2023
Trek10
Connect with Trek10-1

Customers of all shapes and sizes are finding that an event-driven, serverless architecture helps them to accelerate business outcomes.

In this post, Trek10 will share a customer success story and lessons learned from its engagement with Stance, a renowned lifestyle brand that’s revolutionized the sock industry with its innovative designs and high-quality products.

Founded in 2009, Stance has gained a reputation for its unique and stylish socks that combine comfort, functionality, and artistic expression. The brand offers a wide range of sock styles for men, women, and kids, including casual, athletic, and performance socks, and even licensed collaborations with popular brands, artists, and athletes.

We will explore Stance’s transformation journey from a low-code integration platform-as-a-service (iPaaS) platform to leveraging AWS to overcome business challenges. We’ll discuss the problems faced, both business and technical, and the impact of those problems on both customers and Stance’s internal teams.

Additionally, we will detail how the implementation of an event-driven, serverless architecture addressed Stance’s business challenges and led to outcomes, including increased customer lifetime value, cost savings, increased visibility, scalability, and reduced reliance on third-party vendors.

Trek10 is an AWS Premier Tier Services Partner and Managed Service Provider (MSP) that is wholly-focused on Amazon Web Services (AWS) and has deep serverless and event-driven architecture expertise.

Business Challenges

Stance faced notable business challenges with its previous low-code integration platform, which had far-reaching repercussions for both customers and internal teams.

The concept of a low-code iPaaS solution revolves around providing a streamlined development ecosystem that empowers users to create and manage integrations among diverse applications and systems, all while requiring minimal manual coding by using intuitive visual interfaces and connectors.

Given Stance’s reliance on multiple third-party software-as-a-service (SaaS) vendors for its retail operations and logistics management, the company initially chose a low-code iPaaS solution for its ecommerce store. This decision was intended to simplify the coordination between various SaaS services, but instead led to several issues including lack of visibility, manual intervention, and vendor lock-in.

One key issue was the lack of visibility into the underlying workflows due to the nature of low-code, iPaaS solutions. More specifically, Stance struggled to locate error logs, troubleshoot failures, and understand where issues were occurring within the workflows.

Without proper visibility and observability, problems were often detected through manual efforts or customer complaints. The inability to troubleshoot errors or view underlying workflows hampered problem resolution and led to repetitive issues, such as orders not being received or processed. To provide top-notch customer support, Stance’s customer service team was frequently giving away free apparel to salvage positive customer experiences.

An additional problem Stance encountered was the manual intervention necessary to reconcile orders from third-party systems. This was a significant burden on the customer service team, as many processes were time-consuming due to manual processes and having no automation in place.

A third challenge was vendor lock-in. Maintenance of the low-code, iPaaS solution was heavily reliant on a single contractor, which led to a bottleneck when work needed to be completed. There were also gaps in the offerings of other vendors, which limited Stance’s ability to develop new features, slowing down the development process and hindering customization.

Technical Solution and Implementation

To address these significant business challenges, Stance embarked on an AWS greenfield build that leveraged a fully serverless, event-driven architecture providing scalability, flexibility, and cost efficiency.

Major processes were broken down into separate workflows, utilizing AWS services such as AWS Step Functions, AWS Lambda, Amazon API Gateway, Amazon Simple Storage Service (Amazon S3), Amazon Simple Queue Service (SQS), and the AWS Transfer Family.

The preference for an event-driven architecture ensured efficiency and event-triggered processing wherever possible. This solution allowed the system to scale horizontally to meet demand, as opposed to the previous system’s reliance on vertical scaling. Furthermore, when traffic subsides, the system’s ability to scale down results in significant cost savings—this wasn’t a characteristic that was actively sought after, but it turned out to be a welcome outcome.

The deployment process involved a greenfield approach, starting with the implementation of a multi-account structure using AWS Organizations. Serverless applications were built using the AWS Serverless Application Model (SAM) and AWS CloudFormation with deployments automated through GitHub Actions. Structured logging was implemented, with logs forwarded to Datadog for monitoring and alerting purposes.

Serverless and Event-Driven Architecture

Through the use of AWS Step Functions, a powerful orchestration service, Stance gained the ability to manage workflows. This provided visibility into the underlying processes and wasn’t possible with the previous low-code iPaaS solution.

By structuring workflows as state machines, AWS Step Functions enable real-time monitoring of each step’s execution, making it easier for the Stance team to pinpoint errors and exceptions. This heightened transparency facilitates prompt troubleshooting and permits seamless retries of failed steps, enhancing the reliability and robustness of workflows.

With state machines, Stance was empowered to streamline workflow operations, ensuring efficient error resolution and smoother data processing. Additionally, all workflows share subsystems and allow for streamlined development and reuse of components with AWS Systems Manager Parameter Store and Lambda functions.

Trek10-Stance-Serverless-iPaaS-1

Figure 1 – Serverless and event-driven architecture.

The diagram below shows an example of the state machine for the inventory workflow that interacts with third-party integrations. This exemplifies how the state machine orchestrates intricate workflows across different services, including external APIs or custom applications, and demonstrates the power of step functions in creating comprehensive, end-to-end workflows.

State Machine DiagramFigure 2 – State machine for the inventory workflow.

AWS Serverless Application Model

The AWS Serverless Application Model (SAM) is an open-source framework for building serverless applications. It helps with the creation of cloud-based applications using serverless architectures.

In a serverless architecture, developers don’t have to worry about provisioning and managing servers, and that allows them to focus on writing code and creating applications while AWS takes care of the infrastructure and undifferentiated heavy lifting.

Trek10’s selection of the SAM for this project was driven by its alignment with serverless principles, enabling the team to efficiently architect and deploy a greenfield serverless build in AWS. SAM’s seamless integration with AWS services further enhances Stance’s ability to create resilient and cost-effective solutions.

Overcoming Implementation Challenges

As part of the integration, Stance was using a third-party SaaS provider that maintains a hard cap of 25 connections. These connections usually last between 2-300 seconds, based on the type of request.

Notably, Stance contends with a substantial number of concurrent executions in its order workflow, each involving 10 or more distinct requests to the provider at any given point in time. Furthermore, other workflows (products, inventory workflows) generate batches of requests, ranging from hundreds to thousands, scheduled at specific intervals. The inevitable outcome was congestion issues with that SaaS provider.

Balancing design for resiliency and cost control can sometimes pose a challenge. The decision between investing in increased computing power, networking, storage, and similar resources vs. allocating more development time for optimization often calls for careful consideration.

Addressing this congestion by increasing connection concurrency with the third-party SaaS provider was not possible due to the associated 12-month contract commitment and substantial costs involved. Therefore, the pragmatic approach for Stance involved optimization efforts.

To manage the situation, the application segregates requests into synchronous ones, necessitating immediate responses within the workflow and asynchronous requests. The latter are handed over to Amazon AQS and handled by a separate process.

Architecturally, a combination of retry mechanisms and self-throttling techniques were employed throughout the workflow design, ensuring idempotency remains a recurring concern, often steering the decision between synchronous and asynchronous handling for a given request.

Synchronous and Asynchronous Requests by TypeFigure 3 – Synchronous and asynchronous requests by type.

Some of the key characteristics of this architecture and design pattern include:

  • HTTP requests are retried until an explicit retry limit is reached, or AWS Lambda timeout.
  • State machine states are retried in whole upon failure.
  • Self-throttling for asynchronous requests; a Lambda function designed to “fire and forget” these requests uses SQS as its event source. A combination of BatchSize and ReservedConcurrentExecutions effectively throttles the function from opening more than BatchSize x ReservedConcurrentExecutions connections at once.
    • Note that this was prior to AWS introducing maximum concurrency of Lambda functions when using SQS as an event source, which may serve as a more simple solution.

The result is a system which minimizes HTTP 4xx errors and maximizes concurrency.

Outcomes

The implementation of AWS brought significant improvements and positive impacts to Stance. One immediate benefit was the reduction of sales order failures, enabling the customer service team to save roughly 5-7 hours per week per customer service manager, allowing them to focus on more strategic tasks.

The reliability of the system dramatically improved as well, with the failure rate of sales orders dropping to an impressive 0.003% on AWS, compared to the previous platform’s failure rate of 1.5%. The number of stuck orders was reduced from 5-15 per day to 0-1 failure per week. Uptime also increased as AWS provided a more stable environment.

When data cleanliness errors do arise, now even non-technical stakeholders can resolve and redrive state machine executions. Stakeholders are informed when Amazon CloudWatch and Datadog forward well-structured, informative messages to various channels. Leveraging AWS Organizations and AWS IAM Identity Center, stakeholders log into the console with relevant permissions; they use the AWS Step Functions console to edit and reattempt executions.

Stance’s previous low-code iPaaS platform suffered from orders not billing, occurring anywhere from once to 50 times per week. However, since transitioning to AWS Stance has not encountered a single instance of failed billing.

The prior system involved batch processing, where single failures might prevent any subsequent jobs from processing. During one recent holiday, 17,000 orders were stuck in a batch nightmare. The orders required manual processing which took two weeks, causing delays in shipping during a critical sales period. With AWS, orders are architected to process individually.

The cost efficiency of running the system on AWS was also evident, with a roughly 50% reduction in costs compared to the previous platform. Vertical scaling challenges were overcome, eliminating the scaling issues previously faced.

Conclusion

Whether you’re new to serverless or looking to scale, Trek10 allows you to focus on building applications, not managing servers. With over seven years as an AWS MSP, Trek10 offers deep serverless expertise to help you quickly operationalize your applications with best practices.

Trek10’s experience enables the team to complete any project in-house and support you through managed services after the initial engagement.

Learn more about Trek10 Professional Services in AWS Marketplace, and contact Trek10 to explore how serverless, event-driven architecture can benefit your business.

.
Trek-10-APN-Blog-Connect-2023
.


Trek10 – AWS Partner Spotlight

Trek10 is an AWS Premier Tier Services Partner and MSP that helps companies ranging from startups to Fortune 100 enterprises. Trek10 is is wholly-focused on AWS and has deep serverless and event-driven architecture expertise.

Contact Trek10 | Partner Overview | AWS Marketplace