Millennium Management: Secure machine learning using Amazon SageMaker

This is a guest post from Millennium Management. In their own words, “Millennium Management is a global investment management firm, established in 1989, with over 2,900 employees and $39.2 billion in assets under management as of August 2, 2019.”

Millennium Management is comprised of a large number of specialized trading teams across the United States, Europe, and Asia. Many of these trading teams expressed interest in using the AWS Cloud to simplify their machine learning (ML) workflows. Amazon SageMaker, a fully managed ML service, was a natural choice to enable trading teams to move from data preparation, to training, to model deployment, and back again.

Our first task was to use AWS-native controls to make sure that deployments conform to internal standards. Data exfiltration is a key concern at Millennium Management. Trading strategy and algorithms are the secret sauce of hedge funds, so we implement various layers of controls to protect it from exposure. We focus on aspects such as restricting outbound network access from the AWS environment, encryption in transit, and encryption at rest. Encryption in transit helps prevent data interception as it flows through our applications. Encryption at rest, using AWS KMS, helps protect against physical access to data stored on disk if the attacker does not have access to the key to decrypt it. In addition, we prevent privilege escalation, which could lead to circumventing the data exfiltration protections. We detailed our IAM role creation and update service (ICARUS) at the 2019 AWS NYC Summit. Check it out to see how Millennium Management handles IAM role management and prevents privilege escalation through least-permissive IAM roles. The presentation walks through some of the controls AWS offers to provide secure Amazon SageMaker deployments.

Overview of solution

As a fully managed service, Amazon SageMaker deploys resources on your behalf in AWS-managed accounts. The required configuration of these resources may differ depending on your security requirements. For instance, is encryption at rest a requirement? Can you use default KMS keys or does your company require customer-managed encryption keys? Is data exfiltration a concern? Is internet access permitted? This post discusses two ways to enforce conformant API calls: IAM policies and service control policies (SCPs).

Enforcing compliance using IAM

IAM enables you to provide access to AWS APIs securely using fine-grained policies. It is one mechanism you can use to have compliant use of Amazon SageMaker. The following diagram shows a sample permissions workflow.

You can assign an IAM policy permissions set to a user or federated IAM role. The user uses these permissions to access various Amazon SageMaker components, including notebooks, model training, and model usage.

Similarly, some aspects of Amazon SageMaker require API permissions to function. For example, a notebook requires an IAM role to write logs to Amazon CloudWatch Logs and a training job requires access to retrieve training data from Amazon S3. Amazon SageMaker supports several IAM conditions that allow you to shape permissions to your security requirements. This post discusses some of these conditions and how to use them to enforce a security posture. For more information, see Actions, Resources, and Condition Keys for Amazon SageMaker.

Enforcing encryption

Amazon SageMaker is a fully managed service, which means that AWS manages the underlying infrastructure that supports customer deployments. In this scenario, many organizations encrypt data at rest and in transit to prevent access to the data from anyone outside of the organization. There are various API features with which you can enforce encryption.

Enforcing job encryption

You can use the sagemaker:VolumeKmsKey condition key to enforce encryption at rest of data stored on EBS volumes in Amazon SageMaker. You can require a specific KMS key for a training job, for instance, or you can require any KMS key provided, as in the following example. The example code allows the API calls to succeed if the sagemaker:VolumeKmsKey condition is not null (in other words, a KMS key is present in the API call):

{
  "Sid": "Encryption",
  "Effect": "Allow",
  "Action": [
    "sagemaker:CreateEndpointConfig",
    "sagemaker:CreateHyperParameterTuningJob",
    "sagemaker:CreateTrainingJob",
    "sagemaker:CreateTransformJob"
  ],
  "Resource": "*",
  "Condition": {
    "Null": {
      "sagemaker:VolumeKmsKey": "false"
    }      
  }
}

Enforcing inter-container traffic encryption

Now that you’ve tackled encryption at rest in Amazon SageMaker, what about encryption in transit? If you are concerned about unintended parties accessing data while various job containers communicate with each other, you can use the sagemaker:InterContainerTrafficEncryption IAM condition to only allow jobs to run if the traffic between containers is encrypted. See the following example code:

{
  "Sid": "InterContainerTrafficEncryption",
  "Effect": "Allow",
  "Action": [
    "sagemaker:CreateHyperParameterTuningJob",
    "sagemaker:CreateModel",
    "sagemaker:CreateTrainingJob"
  ],
  "Resource": "*",
  "Condition": {
    "Bool": {
      "sagemaker:InterContainerTrafficEncryption": "true"
    }
  }
}

Controlling data egress

Some components of Amazon SageMaker deploy with access to the internet by default. If you are concerned with possible data exfiltration through Amazon SageMaker, you have several IAM conditions at your disposal to make sure usage is as contained as necessary to meet your needs.

Enforcing deployment in VPC

To take advantage of the various security controls available with Amazon VPC as part of an Amazon SageMaker deployment, such as S3 endpoint policies, you can enforce various jobs run within your VPCs by using the sagemaker:VpcSubnets and sagemaker:VpcSecurityGroupIds IAM conditions. You can require any VPC subnets and security group IDs, as in the following example, to provide deployment within a VPC or you can require specific VPC subnets and security groups if more specific controls are necessary. For instance, you may want to require a specific subnet with routing to a permitted external proxy or network access control list applied to it. See the following code:

{
  "Sid": "VPCDeployment",
  "Effect": "Allow",
  "Action": [
    "sagemaker:CreateHyperParameterTuningJob",
    "sagemaker:CreateModel",
    "sagemaker:CreateNotebookInstance",
    "sagemaker:CreateTrainingJob"
  ],
  "Resource": "*",
  "Condition": {
    "Null": {
      "sagemaker:VpcSubnets": "false",
      "sagemaker:VpcSecurityGroupIds": "false"
    }
  }
}

Enforcing network isolation

If you need to prevent external access from Amazon SageMaker training and inference containers completely, you can enforce this with the sagemaker:NetworkIsolation IAM condition and require the value to be “true” when making relevant API calls. This prevents all external access from containers, including to AWS services, and prevents AWS credentials from being assigned to containers. See the following code:

{
  "Sid": "NetworkIsolation",
  "Effect": "Allow",
  "Action": [
    "sagemaker:CreateHyperParameterTuningJob",
    "sagemaker:CreateModel",
    "sagemaker:CreateTrainingJob"
  ],
  "Resource": "*",
  "Condition": {
    "Bool": {
      "sagemaker:NetworkIsolation": "true"
    }
  }
}

Restricting notebook pre-signed URLs to IPs

Jupyter notebooks are a convenient way for data scientists to deploy a familiar interface to explore data and refine ML models. By default, when you create a notebook through Amazon SageMaker, you also create a pre-signed URL that you can access from anywhere. You may choose to restrict access to these notebooks, the data you can access, and the APIs you can execute by using an IP restriction to, for instance, the external IP addresses of your organization’s offices. You can permit access to the notebook only from those offices by using the aws:SourceIp condition in the sagemaker:CreatePresignedNotebookInstanceUrl API call. Similarly, you could use the aws:sourceVpce to limit access to the notebook from only a particular VPC endpoint. See the following code:

{
  "Sid": "RestrictUrlToIp",
  "Effect": "Allow",
  "Action": "sagemaker:CreatePresignedNotebookInstanceUrl",
  "Resource": "*",
  "Condition": {
    "ForAllValues:IpAddress": {
      "aws:SourceIp": [
        "192.168.0.0/16
      ]
    }
  }
}

Disabling internet access

The sagemaker:DirectInternetAccess condition provides an alternative to using sagemaker:VpcSubnets and sagemaker:VpcSecurityGroupIds to prevent internet access from notebook instances. The result for notebooks is the same: you would need to configure access to the Amazon SageMaker API using VPC resources such as a proxy or NAT Gateway. See the following code:

{
  "Sid": “PreventDirectInternet",
  "Effect": "Allow",
  "Action": "sagemaker:CreateNotebookInstance",
  "Resource":    "*",
  "Condition": {
    "StringEquals": {
      "sagemaker:DirectInternetAccess": [
        "Disabled"
      ]
    }
  }
}

Preventing privilege escalation

If you are concerned with privilege escalation through the notebook operating system, you can optionally require root access be disabled in deployed notebooks by using the sagemaker:RootAccess condition. This may impact installations you may need to perform interactively when using the notebook, but you can mitigate this if you know the installations beforehand by using Amazon SageMaker Notebook lifecycle configurations. See the following code:

{
  "Sid": "DenyRootAccess",
  "Effect": "Allow",
  "Action": [
    "sagemaker:CreateNotebookInstance",
    "sagemaker:UpdateNotebookInstance"
  ],
  "Resource":    "*",
  "Condition": {
    "StringEquals": {
      "sagemaker:RootAccess": [
        "Disabled"
      ]
    }
  }
}

Similar to Amazon EC2 userdata, lifecycle configurations run as root at launch or startup, and you can use them to perform installations for users without granting them unfettered ongoing interactive root privileges. For more information, see Customize a Notebook Instance.

Enforcing compliance using SCPs

SCPs are a component of AWS Organizations; you can also use them to enforce compliance when using Amazon SageMaker. In this model, centralized administrators design SCPs, similar to IAM policies, and apply them to accounts or groups of accounts in containers called organizational units (OUs). SCPs do not grant authorization to APIs, however, they define rules limiting permissions that IAM policies grant within the accounts to which they are applied. The diagram below shows the relationship between AWS Organizations, OUs, SCPs, and AWS accounts.

SCPs are especially powerful because you can use them to limit the permissions of every principal in an account, even the root user. SCPs are an invaluable tool to mitigate the risk of privilege escalation in an account. Millennium Management uses these extensively; they are the preferred mechanism for coarse controls because you can write a policy once and apply it to many accounts while applying it to every principal in the account. The following section presents some sample SCPs you can use to achieve compliant use of Amazon SageMaker.

Enforcing encryption

Millennium Management chose to require EBS volume encryption using the Deny effect in our SCP. If the sagemaker:VolumeKmsKey is null, you can deny the API call.

Enforcing job encryption

The following code example requires inter-container traffic encryption by denying an API call if the sagemaker:InterContainerTrafficEncryption condition is false:

{
  "Sid": "DenyUnencryptedVolumes",
  "Effect": "Deny",
  "Action": [
    "sagemaker:CreateTrainingJob",
    "sagemaker:CreateHyperParameterTuningJob",
    "sagemaker:CreateEndpointConfig",
    "sagemaker:CreateTransformJob"
  ],
  "Resource":    "*",
  "Condition": {
    "Null": {
      "sagemaker:VolumeKmsKey": [
        "true"
      ]
    }
  }
}

Enforcing inter-container traffic encryption

You can require inter-container traffic in a similar manner. See the following code:

{
  "Sid": "DenyUnencryptedTraffic",
  "Effect": "Deny",
  "Action": [
    "sagemaker:CreateTrainingJob",
    "sagemaker:CreateHyperParameterTuningJob"
  ],
  "Resource": "*",
  "Condition": {
    "Bool": {
      "sagemaker:InterContainerTrafficEncryption": "false"
    }
  }
}

Controlling data egress

You can control data egress from Amazon SageMaker by using SCPs in the same way you did in IAM policies.

Enforcing deployment in VPC

You can use SCPs to require deployment in a VPC. In the following example code, the API call is denied if either sagemaker:VpcSubnets or sagemaker:VpcSecurityGroupIds is null:

{
  "Sid": "DenyOutsideVpc",
  "Effect": "Deny",
  "Action": [
    "sagemaker:CreateTrainingJob",
    "sagemaker:CreateHyperParameterTuningJob",
    "sagemaker:CreateModel",
    "sagemaker:CreateNotebookInstance"
  ],
  "Resource": "*",
  "Condition": {
    "Null": {
      "sagemaker:VpcSubnets": "true",
      "sagemaker:VpcSecurityGroupIds": "true"
    }
  }
}

This code groups several API calls to which you can apply the VPC-related conditions and deny the API call if the request doesn’t include them.

Enforcing network isolation

You can enforce network isolation with an SCP; see the following code:

{
  "Sid": "DenyNotIsolated",
  "Effect": "Deny",
  "Action": [
    "sagemaker:CreateTrainingJob",
    "sagemaker:CreateHyperParameterTuningJob",
    "sagemaker:CreateModel"
  ],
  "Resource":    "*",
  "Condition": {
    "Bool": {
      "sagemaker:NetworkIsolation": "false"
    }
  }
}

Restricting notebook pre-signed URL to IPs

You also can restrict pre-signed URLs to your IP list using an SCP. See the following code:

{
  "Sid": "RestrictUrlToIp",
  "Effect": "Deny",
  "Action": "sagemaker:CreatePresignedNotebookInstanceUrl",
  "Resource": "*",
  "Condition": {
    "ForAllValues:NotIpAddress": {
      "aws:SourceIp": [
        "192.168.0.0/16
      ]
    }
  }
}

Disabling internet access

If DirectInternetAccess is enabled in the request, you can deny the API call with the following code:

{
  "Sid": "DenyDirectInternet",
  "Effect": "Deny",
  "Action": "sagemaker:CreateNotebookInstance",
  "Resource":    "*",
  "Condition": {
    "StringEquals": {
      "sagemaker:DirectInternetAccess": [
        "Enabled"
      ]
    }
  }
}

Preventing privilege escalation

You can control privilege escalation from Amazon SageMaker by using SCPs in the same way you did in IAM policies.

Disabling root access in notebooks

Similarly, you can deny an API call if RootAccess is enabled. See the following code:

{
  "Sid": "DenyRootAccess",
  "Effect": "Deny",
  "Action": [
    "sagemaker:CreateNotebookInstance",
    "sagemaker:UpdateNotebookInstance"
  ],
  "Resource":    "*",
  "Condition": {
    "StringEquals": {
      "sagemaker:RootAccess": [
        "Enabled"
      ]
    }
  }
}

Conclusion

This post discussed multiple tools to prevent data exfiltration while using Amazon SageMaker. Various IAM policies require encryption and deployment in a VPC. You can also use SCPs at the account level to enforce similar controls. Whichever method you choose, Amazon SageMaker makes it easy to prepare, train, and deploy ML models securely.

About the author

Aaron Fagan is a Principal Cloud Security Engineer with Millennium Management, where he enjoys building automated security guardrails while maintaining business agility.

Artificial Intelligence

Millennium Management: Secure machine learning using Amazon SageMaker

Overview of solution

Enforcing compliance using IAM

Enforcing encryption

Enforcing job encryption

Enforcing inter-container traffic encryption

Controlling data egress

Enforcing deployment in VPC

Enforcing network isolation

Restricting notebook pre-signed URLs to IPs

Disabling internet access

Preventing privilege escalation

Enforcing compliance using SCPs

Enforcing encryption

Enforcing job encryption

Enforcing inter-container traffic encryption

Controlling data egress

Enforcing deployment in VPC

Enforcing network isolation

Restricting notebook pre-signed URL to IPs

Disabling internet access

Preventing privilege escalation

Disabling root access in notebooks

Conclusion

About the author

Resources

Blog Topics

Follow

Learn

Resources

Developers

Help