AWS Storage Blog

How Goldman Sachs leverages AWS PrivateLink for Amazon S3

As a multinational investment bank and financial services company, Goldman Sachs (GS) stores diverse datasets at scale that must always be accessible whilst remaining secure and compliant with regulations and requirements. As a part of its process, Goldman Sachs leverages Amazon Virtual Private Clouds (VPC) to provide secure environments for deployment of resources within AWS, and secure connectivity to its services. As a critical storage service hosting multiple petabytes of business data, it’s vital that access to Amazon Simple Storage Service (S3) is both secure and highly performant.

Today, Goldman Sachs owns and operates several thousand accounts for production and non-production workloads and deployments. A company-wide Core Engineering team is responsible for enabling cloud adoption based on best practices and reusable patterns. To provide secure access to AWS services, Goldman Sachs uses VPC endpoints for both hybrid and cloud-native (VPC based) workloads. Hybrid workloads with traffic flows from on-premises environments usually interact with cloud native services over PrivateLink. This 2020 re:Invent presentation showcases some examples of how Goldman Sachs uses PrivateLink at scale do this.

In this blog, we walk through Goldman Sachs’ S3 VPC endpoint adoption journey and architectural evolution, outlining advantages and disadvantages to each approach, and sharing key learnings for success at scale. We start with architectures based on gateway VPC endpoints for S3, launched in 2015, which enable private access (not requiring Internet or NAT gateways) from resources directly hosted within the associated VPC. Next, we introduce architectures based on interface VPC endpoints for S3 (AWS PrivateLink), launched in 2019, which enable private access from within the associated VPC and also from remote systems such as on-premises servers connected via Direct Connect. We finish with a brief performance comparison of the different approaches, and some recommendations around where you might want to consider adopting PrivateLink for S3.

Evolution of Amazon S3 access at Goldman Sachs

Throughout this section we will take you on the journey of Goldman Sachs’ solutions for hybrid connectivity to S3. We will progress from the initial Amazon Elastic Compute Cloud (EC2) proxy fleet, subsequent Amazon Elastic Container Service (ECS) proxy fleet, and the latest PrivateLink for S3 solution.

Initial approach and challenges: Amazon Elastic Compute Cloud (EC2) proxy fleet

Goldman Sachs’ first version of hybrid access to Amazon S3 via AWS Direct Connect was deployed using gateway endpoints for S3, before the launch of the PrivateLink feature. Since gateway VPC endpoints are only accessible from VPC hosted entities, hybrid requests to S3 have to be routed via proxies. Goldman Sachs originally achieved this using a fleet of EC2 instances running ‘Squid’ HTTP proxy software, managed by the Core Engineering team and hosted within a Direct Connect attached VPC.

Figure 1: Amazon EC2 proxy fleet solution

Figure 1: Amazon EC2 proxy fleet solution

Aligned to Figure 1:

  1. Traffic from on-premises systems routes via firm-wide Direct Connect connections.
  2. To a core-infrastructure owned VPC.
  3. Which hosts a fleet of core-infrastructure managed EC2 instances operating in an Auto Scaling group running Squid proxy software.
  4. These proxies initiate connections onwards via a gateway VPC endpoint.
  5. To S3 buckets which may be owned by core or line of business accounts.

To enforce controls on cloud native endpoints such as S3, Goldman Sachs uses VPC endpoint policies to restrict access to specific consumers (AWS Identity and Access Management (IAM) principals) based on the access required to the service. Initially, all hybrid workflows used a common endpoint, with a single policy that restricted access to Goldman Sachs principals and S3 buckets.

This approach had a few of drawbacks that resulted in issues operating at significant scale:

  • Reduced availability and performance: It required a fleet of proxies dedicated to routing requests to Amazon S3. This proxy fleet was subject to unpredictable workload spikes, which could not be scaled for in time, and their management carried operational overhead for the Core Engineering team. The additional components in the path effectively reduced the overall availability of connectivity to S3 for hybrid users. In some cases, jobs running on compute farms would exhaust the resources of the proxy fleet, significantly increasing observed service latency for all users.
  • Increased complexity: To mitigate these occurrences, teams with high-bandwidth requirements were encouraged to manage their own proxy deployment, reducing blast radius but increasing cost and complexity for business units.
  • Scalability challenges: Goldman Sachs in 2018 had only a handful of AWS accounts. By Jan 2021, this had grown significantly and continues to increase. As the list of permitted accounts within the VPC endpoint policy grew larger and larger, Goldman Sachs reached the 20-KB VPC endpoint policy character limit. Although the aws:PrincipalOrgID IAM key can be used in many cases to simplify this, it is sometimes necessary to secure access for specific buckets/endpoints to specific accounts. GS was initially successful in tuning S3 bucket policy and handling cross-team bucket allow-listing management challenges, but eventually determined that this approach was simply not manageable at the required scale.

Next iteration and challenges: Amazon Elastic Container Service (ECS) proxy fleet

The next iteration of the architecture, outlined in the following diagram, solved for some of these challenges:

Figure 2: Amazon ECS proxy fleet solution

Figure 2: Amazon ECS proxy fleet solution

Aligned to Figure 2:

  1. Traffic from on-premises systems would route via the same firm-wide Direct Connect connections.
  2. In a core-infrastructure owned Amazon S3 proxy VPC.
  3. Which hosts a Network Load Balancer (NLB) providing static IPs for on-premises connectivity.
  4. This NLB distributes traffic to core managed ECS proxy tasks operating in an Auto Scaling group.
  5. These proxies initiate connections onwards via a gateway VPC endpoint.
  6. To S3 buckets which may be owned by core or line-of-business accounts.

This approach helped significantly in increasing availability, scalability, and working around the 20-KB policy limit by having multiple ECS deployments in different GS routable S3 Proxy VPCs. However, this still required deployment and operation of multiple customer managed proxy platforms, each of which was still constrained individually by the 20-KB policy limit and 55,000 connections per minute per NLB target.

Latest evolution: Elimination of hosted proxy solution

With the announcement of AWS PrivateLink for Amazon S3 in February 2021, the architecture was significantly simplified to take advantage of this latest product offering.

Figure 3: AWS PrivateLink for Amazon S3 solution

Figure 3: AWS PrivateLink for Amazon S3 solution

Aligned to Figure 3:

  1. Traffic from on-premises systems would route via the same firm-wide Direct Connect connections.
  2. To a core infrastructure owned S3 endpoint VPC.
  3. This VPC hosts S3 interface VPC endpoints PrivateLink.
  4. Which provide direct access to S3 buckets which may be owned by core or line-of-business accounts.

This update delivered the following benefits:

  • Improved operational efficiency: Eliminating the need to manage a proxy fleet for hybrid use cases to connect to cloud native services, delivering great benefit to the Core Engineering team.
  • Reduced blast radius: Each Business Unit (BU) or Line of Business (LoB) has dedicated Amazon S3 connectivity for production workloads, with a shared general-purpose endpoint used for cost effective development and testing purposes. The dedicated endpoints reduced potential impact of BU cross talk (‘noisy neighbor’ scenario) while providing simple, clear boundaries for BUs to operate within. This is particularly helpful for LoBs with a high number of accounts and use cases with high and unpredictable throughput leading to contention and the potential to swamp other LoBs on a common endpoint.
  • Security guardrails enforcement: Multiple interface endpoints can be configured in a single VPC, each with unique VPC endpoint policies. Each endpoint can be allocated to a single line of business or use case, improving security posture and adhering to the principles of least privilege.

To dive deep a little on the final point above, it’s most effective to bring this to life with sample VPC endpoint policies. For the most restricted use cases, policies lock down access to of the specific line of business Principals, and constrain resource access to specific line of business Accounts using the s3:ResourceAccount directive announced in 2020. From 2022 onwards, for more general purpose or broader use cases, the aws:ResourceOrgID, aws:ResourceOrgPaths, aws:PrincipalOrgID and aws:PrincipalOrgPaths condition keys are combined to create strict organizational perimeters for VPC Endpoints to prevent data exfiltration in a scalable, flexible, and easy-to-manage way.

Here is a sample minimalistic endpoint policy that combines these possibilities:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:Get<List of Get S3 Permissions defined by enterprise or required by line of business / use case>",
        "s3:List<List of List S3 Permissions defined by enterprise or required by line of business / use case>",
        "s3:Put<List of Put S3 Permissions defined by enterprise or required by line of business / use case>",
        // Other S3 actions
      ],
      "Resource": "*",
      "Principal": "*", // A specific Principal or list of Principals can be provided for highly restricted use cases
      "Condition": {
// Note that some combination of the following conditions would be used, not necessarily all of them
// PrincipalOrgID is used to easily provide a boundary mandating the use of enterprise credentials
        "StringEquals": {
          "aws:PrincipalOrgID": ${Your enterprise Org IDs}
        },
// PrincipalOrgPaths is used to easily provide access to Principals within a specific line of business, in this example Machine Learning
        "ForAllValues:StringLike": {
          "aws:PrincipalOrgPaths": ["o-myorganizatio-1/*/ou-machinelearn/*"]
        },
// ResourceOrgID is used to easily provide a boundary only permitting access to enterprise owned S3 buckets
        "StringEquals": {
          "aws:ResourceOrgID": ${Your enterprise Org IDs}
        },
// ResourceOrgPaths is used to easily provide access to resources owned by a specific line of business
        "ForAllValues:StringLike": {
          "aws:ResourceOrgPaths": ["o-myorganization/r-org-path-1/*","o-myorganization-2/r-org-path-2/*"]
        },
// A specific ResourceAccount or list of ResourceAccounts can be provided for highly restricted use cases 
        "StringEquals": {
          "aws:ResourceAccount": ${Target resource account IDs}
        }
      }
    }
  ]
}

You can use similar methods to create curated endpoints for third party vendor services, such as Snowflake. Keeping these vendor flows independent helps improve security posture, supports dedicated tenancy, and enables more targeted performance monitoring.

Here is a sample minimalistic endpoint policy focused on a third party vendor service use case:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:Get<List of Get S3 Permissions required by the third party vendor>",
        "s3:List<List of List S3 Permissions required by the third party vendor>",
        "s3:Put<List of Put S3 Permissions required by the third party vendor>",
        // Other S3 actions
      ],
      "Resource": "*",
      "Principal": "*",
      "Condition": {
// PrincipalOrgPaths is used to easily provide access to Principals within a specific line of business, in this example Machine Learning
        "ForAllValues:StringLike": {
          "aws:PrincipalOrgPaths": ["o-myorganizatio-1/*/ou-machinelearn/*"]
        },
// Provider either ResourceAccount or ResourceOrgPaths based on the level of isolation provided and guaranteed by the vendor
        "ForAllValues:StringLike": {
          "aws:ResourceOrgPaths": ["o-vendororg-1/r-org-path-1/*"]
        },
        "StringEquals": {
          "aws:ResourceAccount": ${Target resource account IDs}
        }
      }
    }
  ]
}

The key elements of how the policy above are different from enterprise policy is as follows:

  • Vendors often provide a dedicated resource account per tenant / customer, so using the S3:ResourceAccount permission provides a reasonably constrained approach. The principal in this case can be a unique set of predefined principals, or can be constrained to a given org path according to business needs.
  • For vendors who offer several dedicated resource accounts per tenant, multiple accounts can be listed or replaced with ResourceOrgPath to support more dynamic growth subject to suitable architecture and isolation boundaries being in use by the vendor. GS seek to use this approach when onboarding with large vendors such as Snowflake.

Please note that the VPC Endpoint policies outlined above are examples included to bring the implementation to life, and in practice are deployed in conjunction with suitably aligned S3 Bucket Policies and IAM policies. For more detailed information and guidance, we recommend reviewing the data perimeters on AWS microsite and associated whitepapers.

How has this evolution helped improve adoption?

Before adoption of PrivateLink for S3, Goldman Sachs’ connectivity to AWS averaged several hundred Mbps of traffic to S3 due to scalability and operational challenges across Direct Connect, EC2/ECS, and the self-managed proxy software. Shortly following introduction of S3 PrivateLink, this ramped up to a peak of 12 Gbps with steady state now around 5-6 Gbps, driven by enhanced ease of adoption and ease of use, performance, and stability. The following diagrams show network throughput (egress and ingress) for the period December 2020 to February 2022, with key increases visible in early March 2021 when PrivateLink was enabled.

Graph showing egress traffic to S3 increasing from 2020 to 2022

Graph showing ingress traffic from S3 increasing from 2020 to 2022

Adoption also enabled onboarding of applications that do not support HTTP proxies, but can use the Amazon S3 interface endpoint via Amazon S3 Software Development Kit (SDK) integration, such as TensorFlow.

As a result of adopting PrivateLink for S3, Goldman Sachs has also been successful in enabling batch processing jobs to complete tasks including heavy read and write operations, which previously overwhelmed the proxies. Long-term monitoring has shown that Application Programming Interface (API) calls over PrivateLink are more efficient and more stable compared to the previous proxy-based solution. Latencies have also improved by almost half in some cases, averaging 10ms for us-east-1 across nearly all request types, with only large multipart upload tests operating slightly slower.

Where can you get the most value from using AWS PrivateLink for Amazon S3?

The Core Engineering team at Goldman Sachs has delivered increased business value through the ease of deployment and adoption of PrivateLink. The benefits include improved security posture, improved workload isolation between lines of business, and an improvement in performance, all while reducing operational overhead associated with deploying and managing proxy fleets.

It is worthy of note that this pattern is most impactful for use cases involving hybrid workloads, where producers or consumers of S3 data are on premises. For purely cloud based use cases, such as where workloads are in a private VPC, S3 gateway VPC endpoints may still be preferable due to their lower cost in high throughput environments.

Key advantages of PrivateLink for S3 include the ability to use security groups to control which traffic flows / sources are permitted to use them, and simpler VPC routing tables. Conversely, Gateway VPC endpoints do not charge for data transfer. With both approaches, VPC endpoints offer significant advantages in improving security and ease of control of access to S3.

Conclusion

In this blog post, we outlined Goldman Sachs’ evolution from Amazon S3 gateway endpoints with EC2 based proxies, through ECS based proxies, to AWS PrivateLink based S3 interface endpoints.

We’ve shared key advantages, and outlined solutions to secure management and adoption of S3 VPC endpoints at scale. We’ve also outlined the most advantageous use cases for each endpoint solution, such as gateway endpoints for high-volume on-cloud deployments, and interface endpoints for hybrid architectures. PrivateLink for Amazon S3 has delivered business benefits such as reduced operational overhead and improved availability, and technical benefits such as improving the security of third-party vendor integrations. These benefits have enabled Goldman Sachs to innovate with greater agility, and resulted in a continued increase in S3 adoption. At the time of writing, Goldman Sachs has transferred tens of petabytes of data via PrivateLink for Amazon S3.

For further information, review the data perimeters on AWS microsite to learn foundational knowledge and best practices for implementing robust access security strategies. Take a look at the AWS Security Blog: Establishing a data perimeter on AWS. Finally, dive deep in to the documentation on AWS PrivateLink for Amazon S3 to start your own journey today.

Sujoy Saha

Sujoy Saha

Sujoy Saha is a Vice President on the Cloud Enablement team at Goldman Sachs. Sujoy builds Goldman Sachs Engineering in their cloud networks deployments, automation, and build solutions for securing connectivity between the firm and the public cloud. Outside of work, Sujoy enjoys hiking, photography, and traveling.

Gerrard Cowburn

Gerrard Cowburn

Gerrard Cowburn is a Solutions Architect with AWS based in the UK. Gerrard supports Global Financial Services customers in greenfield and migration based architectural deep dives and prototyping activities. In his free time, Gerrard enjoys exploring the world through food and drink, road trips, and track days.