Networking & Content Delivery

How Silverflow modernized network operations by combining AWS Cloud WAN and DevOps

In this post, we dive into how at Silverflow we adopted AWS Cloud WAN and how we used standard DevOps practices, to manage our global network in a compliant and secure way.

At Silverflow, our mission to bring payments into the modern era also necessitated that we rethink our network from the ground up. Every transaction we process must be fast, secure, and compliant. Our customers, including banks, payment service providers and large merchants, expect low latency, high availability and strict data residency. Meeting these demands across multiple continents necessitates infrastructure that is global by design and auditable by default, with Payment Card Industry Data Security Standard (PCI DSS) and 3-D Secure (3DS) compliance as a default requirement.

The challenge: a growing and complex network

Our original focus was on Europe, operating out of the Ireland AWS Region (eu-west-1). In 2021, we deployed a standard multi-account setup, structured under an AWS Organizations organization, and split through Organizational Units (OUs). We handled card network connectivity in a single AWS account, with both AWS Site-to-Site VPN and AWS Direct Connect connections landing in a single AWS Transit Gateway (TGW). We handled communication with other accounts/services through many AWS PrivateLink endpoints.

In our initial single-Region setup, the use of PrivateLink kept our traffic private while making connectivity reasonable to manage. Furthermore, we used security group references to control which applications could consume specific PrivateLink endpoints, removing the need for complex routing rules or IP whitelists. However, when we explored the possibility to add other AWS Regions in 2022, we began to notice some challenges. We found ourselves dealing with several questions: we weren’t sure whether to peer Amazon Virtual Private Clouds (Amazon VPCs) or TGWs, and we needed to figure out the best approach for managing routing—static or dynamic. Moreover, we had to carefully consider what kind of complexity this would add to our existing setup.

In Figure 1 you can see a target architecture that we considered.

Multi-Region architecture based on AWS Transit Gateway peering. There are three AWS Regions, each one of them with a Transit Gateway and 1 or 2 workload VPCs connected. In addition, one of the Regions host an automation solution (Amazon Eventbridge & AWS Lambda) to handle routing configuration.

Figure 1. Transit Gateway peering architecture for multi-Region communication.

Automation is one of our core principles, so we leaned toward a solution that would allow some form of dynamic routing. Managing static routes manually added possible human errors and was not an option for us. However, we found this automated solution was too error-prone, making it hard to understand or validate the current network state. It was also challenging to enforce standard networking concepts, such as segmentation, which is essential for our compliance.

The operational risks involved in both static TGW implementations and automated dynamic routing solutions, particularly around configuration errors, fell outside our risk tolerance. However, we still needed to address our multi-Region connectivity needs.

The solution: AWS Cloud WAN and DevOps

We first explored AWS Cloud WAN at re:Invent 2022, and it aligned with our networking goals after a deep dive into our architecture with AWS engineers at the event. AWS Cloud WAN allowed us to think about our network the way we wanted to—as network engineers.

  • Provides policy-based model and native segmentation. This aligns with common network concepts such as isolation and routing domains.
  • Uses dynamic routing to advertise routes between Core Network Edges (CNEs), to and from on-premises.
  • Integrates with external SDN devices such as our Fortigates through Connect attachments.
  • Supports multi-Region and multi-account, making global scaling manageable.

AWS Cloud WAN also enabled us to use standard Infrastructure as Code (IaC) and Continuous Integration and Continuous Delivery (CI/CD) pipelines to manage our network using its policy-as-code principle. We had a version-controlled, peer-reviewed and auditable network.

This encouraged us to use AWS Cloud WAN as the core of our network and allowed us to elevate this network to be one of our core product features, significantly changing our approach to workload and network connectivity, as shown in Figure 2. We reworked our platform, landing on the following concepts, which lay the foundation for our platform to this day:

  • Platform Partitions: self-contained, close to our customers (latency/data residency). Where data is processed and stored.
  • Connectivity Hubs: dual-Region connectivity to the card networks.

New partitions can be spun up in a day using our IaC setup. A partition can use any existing Connectivity Hub, allowing it to process transactions as soon as it is ready. Now we don’t need to worry about manual changes when there’s a routing update. When the new AWS Region and VPCs are attached to AWS Cloud WAN, they are ready. And all of this is done while remaining compliant with PCI DSS and 3DS, and with every core network change being traceable, auditable, and subject to review.

AWS Cloud WAN high-level architecture for Silverflow. Core Network hast 4 segments (PRD, DEV, EGR, CON). There are 4 AWS Regions, and in all of them there are “Platform partition” blocks that connect to either PRD or DEV segments. In two Regions there are “Connectivity Hub” blocks connected to either EGR or CON segments.

Figure 2. AWS Cloud WAN high-level design

IaC with AWS CloudFormation

The central piece of Silverflow’s DevOps strategy was the early adoption of IaC. All AWS Cloud WAN configurations, including core network policies, attachments, and routing, were defined using AWS CloudFormation templates. This made sure of version control, repeatability, and consistency.

Repository structure

Our entire connectivity infrastructure is managed from a single repository. This includes AWS Cloud WAN, VPN, Direct Connect, FortiGate integrations, Amazon VPC IPAM, and firewall policies. The repository is logically organized into several folders as follows:

  • Global definitions: contain shared resources such as AWS Cloud WAN policies, VPC IPAM CIDR blocks, and other non-Regional (global) resources.
  • Connectivity/Regional specific configs: also organized with regional subfolders, cover Direct Connect gateways, VPN definitions, SDN/Fortigate configurations, Amazon Route 53 hosted zones, and alerting rules.
  • Guards and policy checks: where we wrote a custom TypeScript-based policy checker to validate AWS Cloud WAN definitions before deployment.

This structure keeps all networking isolated, versioned, and auditable. Feature teams do not touch any of the resources directly. Only the Platform team can approve and deploy changes through GitLab pipelines, enforced through CODEOWNERS.

Example CloudFormation snippet of Silverflow’s Core Network policy document

In the following code snippet you can find a modified version of our AWS Cloud WAN policy. Although this can be a standard configuration, there are some configuration items that we want to highlight:

  • Within CloudFormation, keeping the attributes DeletionPolicy and UpdateReplacePolicy as Retain help us make sure that if any changes in the code are treated as the deletion of the core network, then this resource is not deleted. Think of this as a double protection mechanism.
  • We use tag-based attachment policies to streamline management.
  • VPC IPAM is used to manage all CIDRs, including inside-cidr-blocks for the CNEs. VPC IPAM is a key part of our automation together with AWS Cloud WAN, because it allows us to control IP addressing globally. This makes sure that new AWS Regions can integrate with the rest of the network without overlapping CIDR blocks. We took the three RFC 1918 address ranges and used them for:
    • Workloads—10/8, the largest block.
    • Connectivity—172.16/12.
    • “Partners”—192.168/16.
CoreNetwork:
    Type: AWS::NetworkManager::CoreNetwork
    DeletionPolicy: Retain
    UpdateReplacePolicy: Retain
    Properties:
      Description: Silverflow Core Network
      GlobalNetworkId: !Ref GlobalNetwork
      PolicyDocument:
        version: "2021.12"
        core-network-configuration:
          vpn-ecmp-support: true
          asn-ranges:
            - 64600-64700
          edge-locations:
            - location: eu-west-1
              inside-cidr-blocks:
                - !Select [2, !Split ["|", !Ref CoreNetworkIpamAllocEuWest1]]
            - location: eu-central-1
              inside-cidr-blocks:
                - !Select [2, !Split ["|", !Ref CoreNetworkIpamAllocEuCentral1]]
            - location: us-east-2
              inside-cidr-blocks:
                - !Select [2, !Split ["|", !Ref CoreNetworkIpamAllocUsEast2]]
            - location: us-west-2
              inside-cidr-blocks:
                - !Select [2, !Split ["|", !Ref CoreNetworkIpamAllocUsWest2]]
            - location: us-east-1
              inside-cidr-blocks:
                - !Select [2, !Split ["|", !Ref CoreNetworkIpamAllocUsEast1]]
        segments:
          - name: prd
            description: Workloads in PRD that do not require partner network access
            edge-locations:
              - eu-west-1
              - eu-central-1
              - us-east-2
              - us-west-2
              - us-east-1
            require-attachment-acceptance: !Ref requireAttachAcceptancePrd
            deny-filter:
              - con
              - condev
              - cnwdev
              - dev
          - name: dev
            description: Workloads in DEV that do not require partner network access
            edge-locations:
              - eu-west-1
              - eu-central-1
              - us-east-2
              - us-east-1
            require-attachment-acceptance: !Ref requireAttachAcceptanceDev
            deny-filter:
              - con
              - conprd
              - cnwprd
              - prd
          - name: egr
            description: VPCs that allows filtered traffic to the internet
            edge-locations:
              - eu-central-1
              - us-east-2
            require-attachment-acceptance: !Ref requireAttachAcceptancePrd
        attachment-policies:
          - rule-number: 100
            conditions:
              - type: tag-exists
                key: segment
            action:
              association-method: tag
              tag-value-of-key: segment
      Tags:
        - Key: Name
          Value: !Sub "${domainAlias}-core-net"
        - Key: team
          Value: Platform
        - Key: env
          Value: !Ref env

CI/CD pipeline

We use our standard GitLab IaC pipelines, largely the same as those used by our developers to deploy our production application code, to deploy the IaC for our network. Network-related IaC is kept in a secure repository, deployed with CODEOWNERS to make sure that only the Platform team can approve merge requests.

Screenshot of customer’s Gitlab, showing a merged example request. The screenshot shows how any change request is approved by only allowed platform team members.

Figure 3. Example merge request

Figure 3 shows a standard merge request, linked to a Jira story and approved by a member of the Platform team. The merge request triggers a pipeline run, which consists of several jobs, shown in Figure 4.

Screenshot of customer’s Gitlab, showing all the automations run once a change request is created. There are three stages (pre, build and tes), where in each one of them several subtasks are run (cfn-guard, get aws-credentials, print-stacks, cfn-lint, and check-policy-compliance)

Figure 4. Build and test pipeline

There are several default jobs in this pipeline, and specifically for this repository, there is the “check_policy_compliance” job. There are no CloudFormation guard rules for AWS Cloud WAN, thus we decided to write something ourselves. These custom rules are there to catch logic errors, and we’re working on expanding these capabilities.

Finally, when this pipeline succeeds, it allows the branch to be merged to main, which triggers a final pipeline that runs the IaC in our AWS environment, as shown in Figure 5. We need human intervention at every step to make sure of control and validation. When the pipeline succeeds, the network change is affected, with no manual intervention in the AWS Management Console.

Screenshot of customer’s Gitlab, showing the deployment pipeline once a change request has been approved. This pipeline only has two tasks: notify_failure, and perform-tasks (deployment)

Figure 5. Deployment pipeline

Summary

The current automation flow makes sure that we do planned, controlled, compliant and automated network changes. No single person can make changes to our network (or production accounts/code), as shown in Figure 6.

Diagram showing the different stages in their automation management when updating their global network: Jira Story - MR approval - Pipeline testing - Deployment trigger

Figure 6. Automated deployment flow

  1. Jira story: planned and agreed upon work.
  2. Merge request (MR) approval: Platform team member approval (not the same person).
  3. Pipeline-based testing: making sure of quality.
  4. Deployment at the engineer’s discretion: making sure that they keep control.

Conclusion

AWS Cloud WAN gave us a policy-based model, native segmentation, and proper routing controls. It speaks BGP. We can use it to think about networking like network engineers. It integrates with third-party SDN devices so that we can extend its functionality.

Combined with VPC IPAM and IaC, rolling out new AWS Regions became streamlined. No spreadsheets, no manual peering, just code. CI/CD provides us the control and visibility we need. Every change is reviewed, tracked, and deployed through automation. Nothing happens outside Git.

Table 1. Summary of solutions and capabilities achieved by Silverflow

Our original goal was to add support for a second Region. But AWS Cloud WAN, combined with our IaC and CI/CD practices, provided a full network reimplementation. We used it to take the network to the level we imagined: a fully automated, compliant, controlled, auditable network, and with minimal operational overhead. Typically, any enterprise with a global network has a networking team. We manage our global network, together with our base AWS platform, with our Platform team. The level of automation and stability, together with our own DevOps practices, enable this.

To learn more about implementing AWS Cloud WAN for your organization, visit the AWS Cloud WAN documentation.

About the authors

Kris Gillespie

Kris Gillespie (Guest)

Kris is the Head of Platform and Security at Silverflow, where he leads both the Platform and Security teams. With over three decades in systems engineering and operations, Kris focuses on building pragmatic platforms that scale. Outside of work, he travels all over the world, enjoys a good drive in the countryside and visits as many festivals as he can every year.

Pablo Sánchez Carmona

Pablo Sánchez Carmona

Pablo is a Senior Network Specialist Solutions Architect at AWS, where he helps customers to design secure, resilient and cost-effective networks. When not talking about networking, Pablo can be found playing basketball or video-games. He holds an MSc in Electrical Engineering from the Royal Institute of Technology (KTH), and a Master’s degree in Telecommunications Engineering from the Polytechnic University of Catalonia (UPC).