AWS Startups Blog
Serverless Tenant Isolation in SaaS Applications with SigTech
Providing innovative technology solutions for some of the world’s leading investors, security is at the heart of everything SigTech does. SigTech offers future-proof quant technologies to global investors. Cloud-hosted and Python-based, the platform integrates a next-gen backtest engine and analytics with curated datasets covering equities, rates, FX, commodities, and volatility. SigTech eliminates the expensive upfront costs of infrastructure build-out, giving clients an edge in alpha generation from day one.
Combining defense-in-depth with automation allows SigTech to meet their security obligations while developing at pace. Through the use of the deep and rich features provided by AWS services, SigTech has been able to build quant technologies that are operable by their engineering team, can be developed and iterated in an agile way, and meet the security requirements of their customers. Their aim is to make the research process as efficient as possible by solving common challenges encountered by systematic traders and researchers.
SigTech’s Solution
Our platform allows systematic investors, such as hedge funds and asset managers, to research, implement, and deploy cross-asset strategies on a single end-to-end platform. Once on-boarded, customers are able to provision environments, known as workspaces, to research and develop their trading strategies. Each workspace allows customers to rapidly provision preconfigured Amazon SageMaker Jupyter notebook research environments with standard and customized datasets immediately available. Quant researchers and traders can then develop and test their strategies within these research workspaces before deploying their models.
When building the SigTech SaaS quant platform, we had several key challenges to consider. The first was to align our platform development methodology with the business focus on rapid time to market. This meant ensuring that the efforts of our team were focused on the development of new, high-quality features, rather than operating manual processes. We therefore took the decision to automate everything possible, including key customer activities such as the on-boarding and provisioning processes.
Operating within the highly-regulated financial services industry, security of our data and our customers’ data and intellectual property is a top priority, so layered security controls have been implemented throughout the platform to meet our customers’ expectations around the protection of their data and trading algorithms.
In order to operate these controls while enabling high quality agile feature delivery, we had to ensure our Security Operations (SecOps) capability had full visibility of our platform and could quickly respond to any risk or incident identified. We decided to gather observability data at scale, and its deep monitoring became a key priority, where leveraging automation in data collection, monitoring, and alerting systems for SecOps became essential requirements.
The need to leverage the Cloud to enable a highly scalable and secure platform that could be continually developed to a high standard in a fast-paced environment became self-evident. We chose AWS for its breadth and depth of services, API and automation-first strategy, and strong security controls.
Product Design
One of the primary considerations for SigTech as a SaaS platform is data security and ensuring that customer data is completely segregated. Each user is given a Research Environment which is based on JupyterLab and powered by Amazon SageMaker. This environment comes preconfigured with the SigTech Quant Framework and access to our data lake containing more than 600,000 instruments and datasets, covering every major asset class and alternative dataset from the world’s leading data providers. Within the Research Environment, users can share workspaces with colleagues on the same tenant and develop and test their strategies before deploying their models to a Production Environment.
We operate our platform in a multi-tenant model, meaning all customer resources exist within the same AWS account and network. Tenant isolation is a fundamental security principle for us, ensuring our customers’ intellectual property is protected. Isolation is achieved by providing each of our customers with unique, auto-generated AWS Identity and Access Management (IAM) roles, and private and unique encryption keys by using Customer-Managed Keys (CMKs) – within the AWS Key Management Service (KMS) to encrypt/decrypt their data. The data is stored in unique Amazon Simple Storage Service (Amazon S3) buckets for each tenant with S3 bucket policies restricting access to only that tenant’s resources.
The Jupyter notebook environment is provided by Amazon Sagemaker Notebook Instances for each tenant. Inbound and outbound network controls are configured in the form of Security Groups. These groups ensure that the instances are isolated from other tenants and prevent any data leakage outside of the environment by blocking all external communication.
All of the platform logs are strictly monitored in a centralized logging location using Amazon S3 and Amazon CloudWatch to store the AWS service’s logs and AWS CloudTrail for monitoring all user activity on the AWS APIs. This is combined with Amazon CloudWatch Alerts & Amazon QuickSight dashboards to provide proactive monitoring.
Solution Walkthrough
Three elements made the SigTech platform possible and allowed us to address the key design challenges:
- Automated customer tenant on-boarding
- Multi-layered security controls, and
- Monitoring and observability at scale.
Figure 1. SigTech’s overall solution design
Automated Tenant Onboarding
Given the requirement to create specific AWS resources for each customer, develop at pace, and ensure security through all aspects of our platform, it was necessary that the process of on-boarding new customers or tenants was repeatable, auditable, and automated. The capacity to scale efficiently, irrespective of the number of customers, was also necessary.
The process would require several steps and multiple calls to APIs to update tenant resources, as the process itself generated new resources. For example, setting up tenant IAM roles, CMK keys in KMS, and S3 buckets would require the additional updating of policies to restrict tenant access to only these resources. Therefore, coordinating these steps in an automated fashion was critical. We identified the stateful workflow management of AWS Step Functions as being the right fit to manage these processes. It proved invaluable in automating our tenant on-boarding process end-to-end without requiring any manual steps.
Figure 2. Business workflow when onboarding a new tenant
We developed AWS Lambda functions that execute various steps in the Step Functions workflow. This allows the steps to be run multiple times until the final desired state is achieved, with the current state of workflow execution and any required retries automatically managed by the Step Functions execution logic.
Purpose of Workflow
In order to achieve a secure multi-tenant environment, SigTech uses a range of AWS services to provision resources for a specific customer and then utilizes the power of IAM roles & permissions to ensure that they are the only tenant who can read that data.
As part of the new tenant workflow, each customer is given their own IAM role & KMS key, as well as an S3 bucket to store all of their custom data. The customer’s IAM role ensures that whether starting up a research environment (powered by SageMaker) or executing a strategy (using AWS Batch & AWS Fargate), security is maintained.
Figure 3. Step Function workflow and associated AWS services used to configure a new tenant
When a new tenant is configured in the SigTech Platform administration portal, a new workflow is triggered to set up the various resources required for that tenant. During the workflow, we persist any state required (e.g., ARN details for newly created resources) in the event that this can be utilized in any subsequent downstream workflow steps. Once everything has been successfully configured, then the final step in the workflow persists any resource ARNs that we need to reference in the future in our tenant Amazon DynamoDB table.
Retrying Failed Workflows
One issue that we encountered when running the tenant on-boarding Step Function workflow was eventual consistency between the IAM cache in some AWS services. Attempts to reference an IAM role or policy, newly created in the previous step, would fail because the new role/policy reference had not yet been replicated for use with other services. The solution here was to simply retry the failed step a few seconds later, including a back off rate to ensure eventual consistency. This required no code change to our existing Lambda logic, but instead was just a Step Function configuration change.
Should a Step Function continue to fail, this would suggest a more complicated underlying issue. In this instance, Step Functions are an ideal architecture, as we can alert the SigTech team to the problem, displaying the logs and fixing the underlying data to resolve the issue or escalate it to another team.
Multi-layered Security Controls
As mentioned in the Solution Design section, layered security design is used throughout the platform to ensure tenant isolation and data protection. This approach leverages dedicated authorization, encryption, and data storage resources for each tenant, networking controls for workload isolation, and deep monitoring capabilities to provide visibility of our controls in practice. It is necessary to consider each of these in more detail.
Identity and Access Management (IAM) roles
During the automated tenant on-boarding process, we create unique Identity and Access Management (IAM) roles for every customer, each with IAM policies providing access only to resources belonging to their tenant. This ensures our customers’ users are protected and isolated from other customer entities and can only access and manage their own SigTech environment and data. The IAM policies configured for these roles have “Allow” actions restricted to tenant-specific resources by explicitly referencing the Amazon Resource Name (ARN) within the policy. For example, the IAM policy snippet below shows how a customer’s IAM role policy gives them access to only manage their assigned S3 bucket.
Figure 4. IAM policy enabled to manage only customer assigned S3 bucket
Furthermore, each IAM role has a trust policy configured to enable administrative service entities to assume the tenant specific role and perform actions on services like Lambda and SageMaker. The next block of code demonstrates an example of this functionality.
Figure 5. IAM trust policy configured to give permissions to other IAM roles
Encryption Key Controls
We also further isolate customer data through encryption key controls. All customer data on our platform is encrypted at rest using dedicated per-tenant Customer-Managed Keys (CMKs) provided by the AWS Key Management Service (KMS). This provides private and unique encryption keys to encrypt/decrypt customer’s data. Each tenant is assigned a CMK with a key policy configured to restrict usage of the key to tenant-specific IAM roles. The following example shows how the KMS policy allows only the customer-specific IAM role to use the key.
Figure 6. KMS key policy enables a specific IAM role to use the key
The IAM role policy is configured to ensure it has the right permissions to manage only the tenant specific KMS key. An example of this configuration can be observed in the following code block.
Figure 7. IAM policy enables KMS key permissions.
These controls ensure that tenants can only access their own encryption key, and the encryption key can only be accessed by that tenant. The data encrypted with these keys can then only be decrypted with the tenant’s authorized users with access to those IAM roles.
Amazon SageMaker
In order to ensure that a customer’s Research Environment is only accessible by their users, we use an IAM role specific to that tenant to run the SageMaker instance. This is achieved using SageMaker instance tags.
Figure 8. IAM policy restricting tenants to their own SageMaker instances
Data Storage & Access Controls
Alongside authorization and encryption controls, controls at the data storage layer provide an additional security barrier. Each customer is issued with a dedicated S3 storage bucket to store their data. The bucket is configured to automatically use the customer’s dedicated CMK to perform server-side encryption of all data stored in the bucket. The S3 bucket uses bucket policy entries to restrict access to only the tenant’s resources.
Figure 9. S3 Bucket policy enables specific IAM role to manage the bucket
These bucket policy controls ensure that only specific tenant’s resources can access that bucket, and the bucket only allows access from resources owned by that tenant. For example, when a customer runs a research environment via the SigTech platform, which launches a SageMaker Notebook Instance, that tenant’s IAM roles are attached to that compute environment. This, in turn, restricts access to that tenant’s CMK and authorizes access to that tenant’s S3 storage bucket only. The tenant can then access and decrypt the data, but only from that S3 bucket and only via that CMK.
Below is an example flow diagram showing the IAM policies, CMK policies and S3 bucket policies restricting access between different tenant resources.
Figure 10. Permissions flow preventing access between resources.
Network Controls
The SigTech platform makes use of the Amazon SageMaker Notebook Instances, which run on top of Amazon EC2 Virtual Machines. To ensure the protection of our customers’ data and specific algorithms, we employ network controls at the EC2 and VPC level to ensure strict data isolation. To restrict data exfiltration, we use dedicated EC2 Security Groups to isolate the tenant SageMaker Notebook instances in a virtual firewall where we can control inbound and outbound traffic. We control the access to each S3 bucket on the network layer using the VPC Endpoint policy feature (presented below), where we allow access to the S3 buckets only from inside the SigTech AWS environment.
{
"Statement": [
{
"Action": "*",
"Effect": "Allow",
"Resource": "arn:aws:s3:::*",
"Principal": "*",
"Condition": {
"StringEquals": {
"aws:PrincipalOrgID": [
"<org_id>"
]
}
}
}
]
}
Figure 11. VPC Endpoint policy restricting access to S3 buckets
These tenant-specific isolation controls are summarized in the figure below, which describes the separation of customer data and the differing CMK permissions between a customer role and a SigTech role.
Figure 12. Layered Security Controls on the SigTech platform
Monitoring Controls
To ensure that our tenant isolation and data protection controls are operating as expected, the SigTech platform is also monitored and protected by various AWS security services. These include:
- AWS WAF and AWS Shield (for platform-level network firewall and DDoS protections)
- AWS GuardDuty (for networking monitoring and threat detection)
- AWS Security Hub (for security log and event collection)
- Amazon Inspector (for vulnerability assessment)
These services all feed log and insight data to a separate, dedicated Security AWS account for auditing, monitoring and isolation from customer and platform environments. This ensures that security data cannot be tampered with by non-security-focused individuals. All the logs from these services are safely stored in a Centralized Logging S3 bucket in this Security account and used for monitoring and alerting processes, see below.
Figure 13. SigTech’s extra layered Security Controls
Monitoring and Observability
For monitoring and audit purposes, each AWS Account stores the logs of several different services in an S3 Bucket. These services include Amazon API Gateway, Amazon CloudFront, Amazon Route53, Amazon Virtual Private Cloud (Amazon VPC), and AWS WAF. To enable the SecOps team to effectively manage our environments, all of the logs are replicated in the dedicated Security AWS account. There, they can be combined and searched using a variety of AWS tools.
An Amazon Athena database is used for all the “Logs” S3 buckets in the Security account, where data can be queried efficiently using SQL queries. This enables the SecOps team to monitor metrics such as rejected SSH connections or requests blocked by the WAF for different purposes, such as someone attempting to exfiltrate data from the platform.
Figure 14. SigTech Centralized Logging System
To make the workflow efficient for SecOps operators, we use QuickSight dashboards. These leverage the Athena databases to visualize the data and empower our SecOps team to gain deeper insights into the data and to identify signals. The code block and dashboards below provide an example of a query used by the SecOps to find blocked HTTP requests within Athena and visualize these results in QuickSight.
Figure 15. QuickSight dataset created with Athena SQL Query
Figure 16. QuickSight dashboard showing Rejected SSH requests
Figure 17. QuickSight dashboard showing users blocked by SigTech WAF rules
Conclusion
Deploying a secure platform is not a one-off exercise; SigTech continues to focus on enhancing our isolation and monitoring controls as part of our day-to-day operations and development activity. We are proud to work closely with our AWS account team, AWS Solutions Architecture teams, and AWS service teams to provide feedback and keep up to date with the latest releases and capabilities.
Further resources you may find useful on the topic of building SaaS platform construction on AWS and multi-tenant security include the AWS SaaS Factory Bootcamp, this blog on implementing SaaS tenant isolation with ABAC and AWS IAM, SaaS Journey Framework whitepaper, and AWS SaaS Boost.
Adam Temple is a Senior Solutions Architect at AWS, working with mid-size enterprises across a variety of industries in the UK. He has over 8 years of experience helping customers transform their businesses through cloud adoption with a focus on DevOps adoption, Data & Analytics Modernization and AI & Machine Learning. When not working, he enjoys Formula 1 and learning new things whenever possible. | |
Nitin Tiwari is a Senior Solutions Architect at AWS, working with customers of all sizes across the UK, helping them across the breadth of the AWS Services. He has an area of focus in the topics of Security & Compliance and Financial Services. When not working with customers or thought leadership projects, he likes to spend time with family and enjoys sports and travel. | |
Brenda Szamosi is a Security Engineer at SigTech, responsible for maintaining the security of the platform by developing and monitoring systems to detect and prevent any potential security breaches. Outside of the office, she likes to spend time with her family, travel to new places or practice paddleboarding. | |
Tim Glass is Head of Platform at SigTech and enjoys developing new and innovative technical solutions to solve their customer’s problems. When not at work he can be found going on adventures with his two young children, or watching YouTube videos to try and learn how to renovate his home. |