AWS Storage Blog
How Common Securitization Solutions built resiliency while optimizing costs with AWS PrivateLink and Amazon S3 Replication
Common Securitization Solutions (CSS), a joint Fannie Mae and Freddie Mac venture launched in 2019, supports a cornerstone of the American economy – home ownership. CSS built and now operates the largest and most advanced mortgage securitization platform in the US, supporting Freddie Mac and Fannie Mae’s 70% market share of the industry with flexibility, scalability, and security at its core.
CSS utilizes AWS to power their solution, the Common Securitization Platform (CSP), in the cloud. In 2021, CSS needed to find efficiency and optimization opportunities when moving market pricing data representing 2.9M files per month. When looking at the cost of co-located circuits, CSS identified that they were paying up to $1M annually to maintain private cross connects (private network connections between co-located data centers).
In this post, we cover how CSS built user-centric solutions for Fannie Mae and Freddie Mac, using the wide variety of AWS services to address their specific requirements. CSS worked with Fannie Mae, Freddie Mac, and AWS account teams to design and build cost-optimized solutions using Amazon S3 Same-Region Replication (S3 SRR) with S3 Replication Time Control (S3 RTC) and AWS PrivateLink. By implementing these storage and networking services in AWS, CSS cut the cost of this workload over 99% to less than $6,000 annually with additional cost savings to both Fannie Mae and Freddie Mac totaling $1.5M annually. In addition, CSS achieved zero data loss and guaranteed delivery of data to subscribers.
The Challenge: Guaranteed delivery, zero data loss
CSS delivers an industry critical function with the CSP, which requires time-sensitive guaranteed delivery of market pricing data needed to price mortgage securities within the US housing market. Inaccuracy or unavailability of any part of this data would cause Fannie Mae, Freddie Mac, and CSS to incur significant reputational and financial impact. Their regulator, Federal Housing Financial Agency (FHFA), requires strict adherence to data immutability and zero data loss. Therefore, the CSP architecture had to manage both resiliency and efficiency standards. However, due to different compliance and governance requirements for Fannie Mae and Freddie Mac, CSS could not leverage the same architecture for both. CSS leveraged AWS PrivateLink VPC Interface endpoints to connect with Freddie Mac, and to share data with Fannie Mae it used S3 SRR with S3 RTC.
In 2021, CSS, Freddie Mac, and Fannie Mae spent up to $1.5M annually to maintain private cross connects and co-located circuits as part of the platform’s transfer and connectivity solution. These communication systems were needed for their availability and security to make sure of guaranteed delivery of market pricing data needed by CSP as well as regulatory authorities. CSS’s original architecture used two private co-located datacenters for connectivity to Fannie Mae and Freddie Mac using private cross connects. Fannie Mae and Freddie Mac each had multiple leased lines dedicated to CSS connectivity terminating in these datacenters.
Solution: AWS native connectivity with cost savings
In 2022, CSS implemented AWS PrivateLink with Freddie Mac to exchange data between the two organizations. CSS also implemented S3 SRR with Fannie Mae to exchange data. PrivateLink and S3 SRR replaced co-location hosting services, which contributed to cost savings, operational efficiency, and resiliency. Moving to cloud-native transmission reduced points of failure, network hops, and operational support overhead while leveraging native AWS services.
In the following year, CSS focused on improving resiliency for the CSP by creating an Active/Active configuration between CSS and Fannie Mae. Originally, the CSP application logic committed objects to CSS owned Amazon Simple Storage Service (Amazon S3) buckets in North Virginia and Ohio AWS Regions simultaneously. This was done to achieve strong data consistency between the two Regions so that they can process workloads in parallel. For increased durability, CSS used S3 SRR to replicate this data to Fannie Mae owned buckets in the same AWS Region. CSS enabled S3 Replication metrics while replicating objects to track the progress of replication and specifically identify any failures in data transfer.
Together with Freddie Mac, Fannie Mae, and their AWS Account teams, CSS built an alternative cloud-native solution using AWS PrivateLink and S3 SRR with S3 RTC. To do this, the Account team actively engaged with the Amazon S3 service team for architectural support. The team helped CSS identify opportunities to enhance operational efficiencies and fine tune replication performance. These innovative solutions removed the need for the datacenters entirely, resulting in major cost savings with the removal of the datacenter lease, dedicated leased-lines, and specific hardware and support costs specified for this communication. This achieved cost savings of 99% to CSS and additional cost savings to Fannie Mae and Freddie Mac.
Freddie Mac connectivity through AWS PrivateLink Interface VPC Endpoints
CSS selected AWS PrivateLink interface VPC Endpoints as the best-suited solution for CSS-Freddie Mac connectivity because it provided a minimal degree of change elements for Freddie Mac and CSS. The solution replaced the dedicated hardline connectivity with the AWS PrivateLink interface VPC Endpoints. The existing gateway services from both companies were unaware of any connectivity change. AWS PrivateLink Interface VPC Endpoints provide private connectivity between two Amazon Virtual Private Clouds (Amazon VPCs) in different organizations so that network traffic does not traverse the public internet. Furthermore, this solution provides a highly-available, high-speed connection.
In Figure 1, Separate AWS PrivateLink connections were deployed in each AWS Region. CSS’s deployment used four AWS Network Load Balancers for better flexibility and operational efficiency. The AWS PrivateLink interface VPC Endpoints were managed by the CSS Network Team and targeted AWS load balancers managed by the CSS Gateway Team. The Gateway Team could manage all aspects of their services independent of the AWS PrivateLink connectivity.
To make sure of reliable Regional failover, CSS used Amazon Route 53 with automatic health checks so that the network traffic traverses through a healthy Regional AWS PrivateLink interface VPC Endpoint, which can target either AWS Region’s desired services. When a target of a health check is no longer responding, Route 53 marks that target as unhealthy. Route 53 uses the failover target DNS name to route requests to the healthy target. This makes sure that clients are seamlessly routed to the next available healthy target. Once the unhealthy AWS PrivateLink interface VPC endpoint target recovers, the target is added back to the pool of healthy targets without the need for fail back decision points or manual intervention.
Figure 1: PrivateLink interface endpoint architecture between Freddie Mac and CSS
Fannie Mae data transfer using S3 Same-Region Replication and S3 Replication Time Control
Fannie Mae has a regulatory requirement for staging all externally sourced data for vulnerability scanning prior to being introduced to the production environment. To solve for this regulatory requirement, Fannie Mae leveraged S3 SRR file transmission from CSS to Fannie Mae, allowing them to scan external data for vulnerabilities before further analysis. CSS used S3 SRR to reliably replicate nearly 500K critical market pricing data files to Fannie Mae. To make sure of a predictable replication time, CSS enabled S3 RTC in their replication configuration, which helped them meet the narrow SLA time during business cycle processing. S3 RTC replicates most objects in seconds and is backed by an SLA to replicate 99.9 percent of those objects within 15 minutes. Enabling S3 RTC automatically enables S3 Replication metrics and events that can be used to monitor minute-by-minute progress of replication by tracking the bytes pending replication, operations pending replication, operations that failed replication, and the replication latency in seconds. Additionally, S3 RTC provides OperationMissedThreshold and OperationReplicatedAfterThreshold events that notify the bucket owner if object replication exceeds or replicates after the 15-minute threshold. CSS closely monitored S3 RTC events and metrics and used Amazon CloudWatch alarms to get notified if any thresholds were missed.
As part of this solution, Fannie Mae and CSS used AWS account boundaries to isolate the security blast radius with separate accounts for Amazon S3 data exchange and middleware compute components.
Figure 2: CSP architecture integration with S3 SRR between Fannie Mae and CSS
Workload observability and resiliency with S3 SRR and Amazon SQS
With the switch to S3 SRR in a multi-account strategy, S3 buckets from CSS AWS accounts integrated with Fannie Mae’s S3 buckets to exchange all payloads between the two organizations. CSP needs strong consistency for S3 objects in both the North Virginia and Ohio AWS Regions. To solve this, the CSP application logic commits S3 objects to CSS buckets in both the North Virginia and Ohio Regions at the same time. Once CSP commits the new S3 objects, S3 SRR replicates each object in each regional bucket to Fannie Mae’s S3 buckets in the North Virginia and Ohio Regions. S3 SRR, with S3 RTC enabled, provides both organizations with a low latency eventual consistency solution. This approach provides an Active/Hot standby parallel processing capability in both AWS Regions requiring minimal manual intervention in case of failure in a single Region. In addition to replicating within the same Region, both CSS and Fannie Mae use S3 Cross-Region Replication (S3 CRR) to back-up critical data across AWS Regions and to copy database snapshots.
Figure 3: Multi-region CSP architecture design with S3 SRR in each regional bucket
While using S3 Replication, S3 Replication metrics generate independent replication events in both the North Virginia and Ohio Regions, and the events are available in the corresponding Region Amazon Simple Queue Service (Amazon SQS) Queue. Amazon S3 is designed to deliver notifications with a high degree of reliability using built-in retry mechanisms. To make sure of guaranteed processing of event notifications, the retry mechanism might cause duplicate notifications for the same object event on rare occasions. CSS addressed this issue by designing the solution to be idempotent using a unique transaction ID that would make sure of the processing of an event only once. They leveraged CloudWatch logs, CloudWatch alarms, and Splunk for observability and idempotency of business transactions from start to end. For example, detailed monitoring of CloudWatch logs – Amazon S3 event notifications, Amazon SQS handling – was developed to identify any errors that would trigger an incident management process that would make sure of timely remediation.
Figure 4: Multi-Region S3 Replication Time Control Metric observability architecture
Conclusion
The Government Sponsored Entities (GSE) comprising Freddie Mac, Fannie Mae and CSS, together administer ~70% of mortgage-backed securities in the US housing market and need stability, security, and efficiency to make sure of the provision of liquidity for millions of new homeowners. The CSP, including its connectivity and transmission, is built with this criticality in mind, as it supports market pricing data of 6.1 trillion dollars of assets.
CSS, Freddie Mac, and Fannie Mae leveraged Amazon S3, S3 Replication Time Control, S3 Same-Region Replication, and AWS PrivateLink to save a collective $1.5M annually while reducing points of failure, network hops, and operational support overhead. CSS alone realized a 99% reduction in cost. Use this real-world reference as a blueprint for exchanging data securely and efficiently without the need for co-located private connectivity with other organizations.
Additional information and getting started
For exposing API endpoints cross organization without the need for public internet access, collocated private circuits, or site-to-site VPN, get started with architecture patterns for consuming private APIs cross-account. You can also use the native integrations with API gateway “Building Private Cross-account APIS using Amazon API Gateway and AWS PrivateLink.” Builders that want some hands-on experience can try this whitepaper for “Building a Scalable and Secure Multi-VPC AWS Network Infrastructure using AWS PrivateLink” VPC Endpoints.
To get started with S3 SRR for a cross organization workload, visit the replicating objects section in the Amazon S3 User Guide, and for a step-by-step tutorial on setting up S3 Replication, visit “Replicate data within and between AWS Regions using Amazon S3 Replication.” To replicate existing objects, you can use S3 Batch Replication to backfill a newly created bucket with existing objects, retry objects that were previously unable to replicate, migrate data across accounts, or add new buckets to your data lake. For step-by-step guidance, visit “Replicate Existing Objects in your Amazon S3 Buckets with Amazon S3 Batch Replication.”