How Smarsh reduced costs and increased scale migrating from Pivotal Cloud Foundry to Amazon EKS

This post is co-written with Evan Doyle and Blake Sherwood from Smarsh

Customer Overview

As a global leader in digital communications compliance and intelligence, Smarsh enables companies to transform oversight into foresight by surfacing business-critical signals across all digital communications. The company serves a global client base that spans 18 of the top 20 banks, leading brokerage firms and government agencies.

Within Amazon Web Services, Smarsh initially deployed its micro-service-based applications on Cloud Foundry Platform-as-a-Service (PaaS). In 2023, Smarsh chose Amazon Elastic Kubernetes Service (Amazon EKS) to modernize their application container infrastructure due to Cloud Foundry’s feature gaps and costs. Amazon EKS is a managed Kubernetes service that handles the availability and scalability of Kubernetes. By deploying applications to Amazon EKS, Smarsh would reduce operational costs while retaining the ability to meet varying customer traffic. Starting in October 2023, Smarsh, working with AWS Professional Services (ProServe), migrated over 250 applications from Cloud Foundry PaaS to Amazon EKS in less than 10 months.

This post explores their migration journey, highlighting key decisions, challenges, and outcomes.

Challenges with existing PaaS solution

Before choosing Amazon EKS, some key challenges were identified with the Cloud Foundry platform:

Operating Costs: Running Cloud Foundry distributions required both a minimum infrastructure and substantial licensing fees in the six to seven figure range, which directly impacted operational budgets.

Technology Lag: Legacy PaaS solutions lagged in adopting the latest advancements in containerization and cloud-native technologies. This gap limited access to the broader ecosystem of cutting-edge tools and integrations that modern container platforms like Kubernetes offer.

AWS Feature Utilization: The inability to directly leverage the newest AWS services and features, e.g., AWS Graviton processors, hampered innovation. Instead of tapping into new features as they become available, they were reliant on the PaaS provider to incorporate these advancements.

Scalability and Resource Efficiency Limitations: While Cloud Foundry manages scaling, the level of granular control over scaling mechanisms, resource allocation, and bin-packing efficiency might not match what’s achievable with Kubernetes. This could lead to scenarios with suboptimal resource utilization or less flexible scaling responses compared to the fine-grained control offered by EKS.

Talent Acquisition and Skillset Modernization: As the industry gravitates towards Kubernetes as a de facto standard, sourcing talent with deep expertise in proprietary or older PaaS solutions becomes increasingly difficult. Conversely, the talent pool for Kubernetes is already extensive.

Existing Architecture

Before migrating to Amazon EKS, teams operated a diverse technology stack within the Cloud Foundry environment. The micro-services architecture encompassed applications built using Java Spring Boot, Python, Go, and Node.js frameworks. The infrastructure relied on Apache ZooKeeper for managing micro-service configurations, while CredHub handled the secure storage of application secrets. Service discovery capabilities were provided through a mix of Cloud Foundry GoRouter, Cloud Controller, and Diego.

NoSQL and relational databases were implemented to meet various data persistence requirements. Messaging infrastructure utilized Apache Kafka and RabbitMQ. Application deployment and continuous integration were managed through Concourse pipelines. Smarsh deployed a custom Elasticsearch, Logstash, and Kibana (ELK) stack for logging capabilities. OpenTelemetry and Honeycomb were implemented for distributed tracing, while Datadog provided monitoring and alerting functions across environments.

Architecture diagram showing legacy Cloud Foundry architecture showing platform runtime components, application framework, and other associated functions.
Figure 1: Cloud Foundry Architecture

Target Architecture

Post migration, Smarsh built a robust architecture using Amazon EKS as the foundation. The new architecture deployed multiple EKS clusters to host both Kubernetes plugins and application workloads. Service discovery was replaced with Istio, providing enhanced service discovery and mesh capabilities. Elastic Load Balancing (ELB) managed ingress traffic for customer-facing applications, ensuring reliable access and scalability. Configuration management evolved from Apache ZooKeeper to Kubernetes ConfigMaps while secrets management transitioned from CredHub to HashiCorp Vault. Smarsh embraced AWS managed services to replace its Cloud Foundry marketplace services. Database needs were met with Amazon DocumentDB (with MongoDB compatibility) for NoSQL workloads, Amazon Relational Database Service (Amazon RDS) for relational databases, and Amazon ElastiCache for Redis caching. For messaging capabilities, Amazon MQ was implemented for RabbitMQ messaging and Amazon Managed Streaming for Apache Kafka (Amazon MSK). These managed services reduced operational overhead while maintaining high availability and scalability.

The CI/CD pipeline underwent enhancement with additional containerization steps, now building and pushing containers to both Harbor and Amazon Elastic Container Registry (Amazon ECR). Fluentbit was implemented for log collection while OpenTelemetry, HoneyComb, and Datadog were moved to native EKS plugins.

Diagram showing new Amazon EKS architecture using management and workload cluster with inbound connections via Elastic Load Balancing and database and messaging connectivity via AWS Transit Gateway. It also shows AWS and 3rd party supporting services. Figure 2: Amazon EKS Architecture

Migration Process

Migrating applications from Cloud Foundry to Amazon EKS requires careful planning and execution. The journey began with a comprehensive discovery process to understand the application landscape, infrastructure requirements, networking configurations, and security dependencies. This assessment helped create a detailed migration roadmap with clear phases, timelines, and resource requirements. During planning, the team collaborated with AWS ProServe to identify networking and security as areas where ProServe would accelerate the migration process. The migration process was structured into the following well-defined phases.

Pre-migration Phase: Application preparation and dependency updates

The pre-migration phase focused on preparing workloads for Kubernetes adoption. Development teams upgraded applications’ runtime dependencies and implemented essential health checks to enable Kubernetes scaling through Liveness, Readiness, and Startup Probes. Cloud Foundry-specific dependencies, such as Eureka or CredHub, were removed as they wouldn’t be needed in the new environment. All these changes were stored in Git branches, ready for the migration phase.

Network Build Phase: Establishing the foundational network architecture

Before migrating to Amazon EKS, Smarsh prioritized building a scalable, future-proof network architecture. Working with AWS ProServe, they designed and implemented a comprehensive networking foundation that supported the immediate migration needs as well as the long-term expansion goals. During this phase, a standardized Amazon Virtual Private Cloud (Amazon VPC) architecture was implemented that provided consistent networking patterns across environments using Infrastructure as Code (IaC). The team established comprehensive network monitoring and observability using Amazon VPC Flow Logs for detailed traffic analysis. To improve reliability, DNS infrastructure and their IPAM solution were externalized. Comprehensive traffic inspection capabilities were incorporated to provide network observability.

Infrastructure Build Phase: Deploying and configuring the Amazon EKS platform

Building a stable Kubernetes platform was crucial for application teams. Smarsh developed its Amazon EKS platform with necessary controllers and plugins to support seamless application migration. The team first deployed the control plane using the latest Kubernetes version, then set up the data plane using managed node groups with Bottlerocket AMIs. Custom networking with additional CIDR ranges for workloads was implemented to allow for scale. Amazon Elastic Block Store (Amazon EBS) was configured to offer persistent storage. AWS Identity and Access Management (IAM) was integrated with OIDC providers to offer seamless authentication.

The teams used IaC to develop all infrastructure and integrated it with pipelines to deploy production-grade Amazon EKS clusters consistently across environments. For configuration management, GitOps practices were adopted using ArgoCD Autopilot and Kustomize. This enabled consistent base configurations across environments with environment-specific customizations.

The platform included essential controllers such as ArgoCD, AWS Load Balancer Controller, Cert-Manager with AWS Private Certificate Authority integration, and External-DNS Controller for Amazon Route 53 integration. Istio was used for service discovery and service mesh capabilities, KubeCost for cost management, and Vault Secrets Operator for secrets management. As mentioned in the target architecture, the platform leveraged FluentBit, Datadog, and Honeycomb for observability and monitoring.

To replace Cloud Foundry marketplace services, IaC modules were created for AWS managed services, including Amazon DocumentDB, Amazon RDS, Amazon MQ, Amazon MSK, and Amazon ElastiCache. Using established pipelines, the infrastructure was replicated across multiple environments.

Security Build Phase: Implementing comprehensive security controls

As a trusted partner to leading financial institutions worldwide, Smarsh understands that security isn’t just a technical requirement—it’s a fundamental business imperative. The team prioritized enhancing their security posture to create a more robust and secure Kubernetes platform. With help from AWS Professional Services, a thorough assessment was completed to document the full set of security requirements. Identity and access management controls were implemented by integrating existing identity provider with IAM and Amazon EKS. This integration enabled fine-grained access controls and maintained the principle of least privilege. Understanding that container security requires multiple layers of protection, automated vulnerability scanning was implemented. For runtime protection, pod security policies in Amazon EKS clusters were leveraged to prevent potential exploits. Open Policy Agent (OPA) was deployed to enforce strict controls on the platform and monitor violations.

AWS Network Firewall and AWS WAF were implemented for traffic filtering. Using Istio, mutual TLS authentication was enabled to ensure secure communication between services. AWS Key Management Service (AWS KMS) provided support for encryption at rest, while HashiCorp Vault secured the application secrets. Detailed response procedures combined with automated detection and response capabilities support swift and effective response to security incidents.

Pilot Phase: Testing migration patterns with selected applications

The pilot phase marked a critical milestone in the journey from Cloud Foundry to Amazon EKS. This phase focused on validating migration patterns and establishing best practices for the broader migration effort. To begin, the team selected three existing applications from their Cloud Foundry environment. These applications served as ideal candidates to test and refine the migration process. A key difference between Cloud Foundry and Amazon EKS environments lies in container management responsibility. While Cloud Foundry handles containerization automatically, Amazon EKS transfers this responsibility to the application teams. To address this shift, the platform team first established a foundation of standardized base container images. These images provided consistent, secure starting points for all applications moving to Amazon EKS. The team then enhanced their existing CI/CD pipelines to incorporate container building capabilities using these base container images. These enhanced pipelines now automatically build container images and distribute them to both Harbor and Amazon ECR, where they are deployed using ArgoCD.

The pilot phase also focused on creating reusable deployment configurations. Standardized Kustomize manifests and Helm charts were developed that included necessary configurations. Application secrets were migrated from Cloud Foundry to HashiCorp Vault for application secrets management. Container configurations transitioned to Kustomize and Helm manifests, providing version-controlled, environment-specific settings.

Migration Factory Phase: Large-scale migration of all applications

After successfully completing the pilot phase, the organization began the large-scale migration of 250 applications in two stages. The mass migration phase leveraged the reusable artifacts, lessons learned, and proven patterns established during the pilot to accelerate the migration process across all application teams. Standardized Helm charts and Kustomize templates streamlined Kubernetes deployments. Each application team enhanced their CI/CD pipelines using established patterns. Application configurations were methodically migrated from Cloud Foundry to Kubernetes ConfigMaps and secrets migrated to HashiCorp Vault.

Testing remained a priority throughout the migration. Application teams conducted thorough testing of their migrated applications, validating functionality, performance, and security requirements.

Operations and Day 2 activities: Ensuring ongoing operational stability

While teams were performing the migrations, the platform team focused on day-to-day operational activities. Their goal was to ensure that once all workloads were migrated to Amazon EKS, both the platform and application teams had a robust strategy to maintain production environments effectively.

To achieve this, a thorough assessment of the existing infrastructure was conducted by the project team collaborating with AWS Professional Services. This evaluation helped identify and address several critical areas. Cluster stability was enhanced, comprehensive debugging mechanisms for issues and failures were implemented, and advanced monitoring solutions to promptly report infrastructure concerns were deployed. Additionally, systems were established to effectively trace application logs and metrics across the entire environment.

This proactive approach significantly improved the overall reliability and observability of the cloud infrastructure, setting a strong foundation for future scalability and performance optimization.

Migration Factory Cutover: Executing the final transition in stages

The cut-over execution unfolded as a meticulously planned, multi-stage process. The process was designed to minimize disruption to core services while ensuring a controlled transition for all components. The strategy involved an initial, largely seamless backend migration, followed by iterative frontend cut-overs for remaining workloads, which included customer participation.

Preliminary: Practice makes perfect. Over nine weeks, the team tested and re-tested advanced workflows with worst-case migration scenarios from Cloud Foundry to Amazon EKS infrastructure, including restorations and fallbacks. Once comfortable with the process, they proceeded to the customer phases.
Stage 1 – Backend: A key aspect of this phase was the transparent migration of backend workloads. Focusing on the Apache Kafka interactions and consumer groups, data streams and processing tasks were redirected to the new Amazon EKS environment. This redirection occurred without interrupting or altering the primary application workflows, allowing a significant portion of the backend to be shifted seamlessly. Customers were informed during this phase, which was less intrusive to day-to-day operations.
Stage 2 – Frontend: To complete the migration, a critical element of these planned maintenance windows was direct customer involvement. This ensured that the schedule was communicated to affected customers, which allowed them to participate in any pre-migration checks or post migration validations. The alignment between Smarsh and the customer allowed for a smooth transition of their specific services or data. This involved cut-over of ingress traffic to new endpoints away from legacy infrastructure.

Outcomes and Lessons Learned

The migration to Amazon EKS has delivered significant benefits for Smarsh. Here are some of the key advantages and lessons learned.

Cost savings: By adopting Amazon EKS, Smarsh has optimized resource utilization, leveraging the pay-as-you-go model offered by AWS. This flexible approach eliminated the need for fixed vendor licensing costs, directly improving the company’s bottom line. The migration to Amazon EKS provided high 7-digit savings in infrastructure costs and removal of low 8-digit license renewal fees.

Increased scale: The new platform provides the business with the capability to onboard multiple large customers simultaneously while maintaining optimal performance levels. Prior to using Amazon EKS, onboarding new customers often required complex capacity planning and potential licensing adjustments. Now, they seamlessly accommodate rapid customer growth without infrastructure constraints. They have grown from approximately 2,500 deployed services on existing PCF foundations to managing 11,500+ services on 60+ Amazon EKS Clusters.

Enhanced Deployment Velocity: By adopting modern cloud-native container technologies, the platform team significantly improved its deployment processes and developer productivity. This resulted in lower time to market for key features in the new platform.

Networking: The team discovered services and features late in the migration process that may have been useful. For example, AWS Cloud WAN provides a promising direction that would have changed Smarsh’s approach to migration of customer ingress controls.

Upfront training: Newer technologies needed significantly more training as part of the migration process. While selected tactics did work, not enough time was allocated to detailed operational needs.

Conclusion

Smarsh’s successful migration from Cloud Foundry to Amazon EKS demonstrates how organizations can modernize their container platforms while achieving significant business outcomes. Through close collaboration with AWS, they transformed their application infrastructure, optimized operational efficiency, and realized substantial cost savings.

The journey outlined in this post—from initial assessment through architecture decisions to final implementation—serves as a practical blueprint for organizations considering similar modernization initiatives. The migration strategy balanced technical requirements with business priorities, resulting in enhanced scalability, improved deployment velocity, and optimized resource utilization.

Organizations running legacy PaaS solutions can use Smarsh’s experience as a reference architecture for their own modernization journey to Amazon EKS. The proven success demonstrates that similar transformations can be implemented quickly while creating significant value for expansion and innovation initiatives.

Migration & Modernization

How Smarsh reduced costs and increased scale migrating from Pivotal Cloud Foundry to Amazon EKS

Customer Overview

Challenges with existing PaaS solution

Existing Architecture

Target Architecture

Migration Process

Outcomes and Lessons Learned

Conclusion

About the authors

Learn

Resources

Developers

Help