AWS Machine Learning Blog

Enforce VPC rules for Amazon Comprehend jobs and CMK encryption for custom models

You can now control the Amazon Virtual Private Cloud (Amazon VPC) and encryption settings for your Amazon Comprehend APIs using AWS Identity and Access Management (IAM) condition keys, and encrypt your Amazon Comprehend custom models using customer managed keys (CMK) via AWS Key Management Service (AWS KMS). IAM condition keys enable you to further refine the conditions under which an IAM policy statement applies. You can use the new condition keys in IAM policies when granting permissions to create asynchronous jobs and creating custom classification or custom entity training jobs.

Amazon Comprehend now supports five new condition keys:

  • comprehend:VolumeKmsKey
  • comprehend:OutputKmsKey
  • comprehend:ModelKmsKey
  • comprehend:VpcSecurityGroupIds
  • comprehend:VpcSubnets

The keys allow you to ensure that users can only create jobs that meet your organization’s security posture, such as jobs that are connected to the allowed VPC subnets and security groups. You can also use these keys to enforce encryption settings for the storage volumes where the data is pulled down for computation and on the Amazon Simple Storage Service (Amazon S3) bucket where the output of the operation is stored. If users try to use an API with VPC settings or encryption parameters that aren’t allowed, Amazon Comprehend rejects the operation synchronously with a 403 Access Denied exception.

Solution overview

The following diagram illustrates the architecture of our solution.

We want to enforce a policy to do the following:

  • Make sure that all custom classification training jobs are specified with VPC settings
  • Have encryption enabled for the classifier training job, the classifier output, and the Amazon Comprehend model

This way, when someone starts a custom classification training job, the training data that is pulled in from Amazon S3 is copied to the storage volumes in your specified VPC subnets and is encrypted with the specified VolumeKmsKey. The solution also makes sure that the results of the model training are encrypted with the specified OutputKmsKey. Finally, the Amazon Comprehend model itself is encrypted with the AWS KMS key specified by the user when it’s stored within the VPC. The solution uses three different keys for the data, output, and the model, respectively, but you can choose to use the same key for all three tasks.

Additionally, this new functionality enables you to audit model usage in AWS CloudTrail by tracking the model encryption key usage.

Encryption with IAM policies

The following policy makes sure that users must specify VPC subnets and security groups for VPC settings and AWS KMS keys for both the classifier and output:

{
   "Version": "2012-10-17",
   "Statement": [{
    "Action": ["comprehend:CreateDocumentClassifier"],
    "Effect": "Allow",
    "Resource": "*",
    "Condition": {
      "Null": {
        "comprehend:VolumeKmsKey": "false",
        "comprehend:OutputKmsKey": "false",
        "comprehend:ModelKmsKey": "false",
        "comprehend:VpcSecurityGroupIds": "false",
        "comprehend:VpcSubnets": "false"
      }
    }
  }]
}

For example, in the following code, User 1 provides both the VPC settings and the encryption keys, and can successfully complete the operation:

aws comprehend create-document-classifier \
--region region \
--document-classifier-name testModel \
--language-code en \
--input-data-config S3Uri=s3://S3Bucket/docclass/filename \
--data-access-role-arn arn:aws:iam::[your account number]:role/testDataAccessRole
--volume-kms-key-id arn:aws:kms:region:[your account number]:alias/ExampleAlias
--output-data-config S3Uri=s3://S3Bucket/output/file name,KmsKeyId=arn:aws:kms:region:[your account number]:alias/ExampleAlias
--vpc-config SecurityGroupIds=sg-11a111111a1exmaple,Subnets=subnet-11aaa111111example

User 2, on the other hand, doesn’t provide any of these required settings and isn’t allowed to complete the operation:

aws comprehend create-document-classifier \
--region region \
--document-classifier-name testModel \
--language-code en \
--input-data-config S3Uri=s3://S3Bucket/docclass/filename \
--data-access-role-arn arn:aws:iam::[your account number]:role/testDataAccessRole
--output-data-config S3Uri=s3://S3Bucket/output/file name

In the preceding code examples, as long as the VPC settings and the encryption keys are set, you can run the custom classifier training job. Leaving the VPC and encryption settings in their default state results in a 403 Access Denied exception.

In the next example, we enforce an even stricter policy, in which we have to set the VPC and encryption settings to also include specific subnets, security groups, and KMS keys. This policy applies these rules for all Amazon Comprehend APIs that start new asynchronous jobs, create custom classifiers, and create custom entity recognizers. See the following code:

{
   "Version": "2012-10-17",
   "Statement": [{
    "Action":
     [
    "comprehend:CreateDocumentClassifier",
    "comprehend:CreateEntityRecognizer",
    "comprehend:Start*Job"
    ],
    "Effect": "Allow",
    "Resource": "*",
    "Condition": {
      "ArnEquals": {
        "comprehend:VolumeKmsKey": "arn:aws:kms:region:[your account number]:key/key_id",
        "comprehend:ModelKmsKey": "arn:aws:kms:region:[your account number]:key/key_id1",
        "comprehend:OutputKmsKey": "arn:aws:kms:region:[your account number]:key/key_id2"
      },
      "ForAllValues:StringLike": {
        "comprehend:VpcSecurityGroupIds": [
          "sg-11a111111a1exmaple"
        ],
        "comprehend:VpcSubnets": [
          "subnet-11aaa111111example"
        ]
      }
    }
  }]
}

In the next example, we first create a custom classifier on the Amazon Comprehend console without specifying the encryption option. Because we have the IAM conditions specified in the policy, the operation is denied.

When you enable classifier encryption, Amazon Comprehend encrypts the data in the storage volume while your job is being processed. You can either use an AWS KMS customer managed key from your account or a different account. You can specify the encryption settings for the custom classifier job as in the following screenshot.

Output encryption enables Amazon Comprehend to encrypt the output results from your analysis. Similar to Amazon Comprehend job encryption, you can either use an AWS KMS customer managed key from your account or another account.

Because our policy also enforces the jobs to be launched with VPC and security group access enabled, you can specify these settings in the VPC settings section.

Amazon Comprehend API operations and IAM condition keys

The following table lists the Amazon Comprehend API operations and the IAM condition keys that are supported as of this writing. For more information, see Actions, resources, and condition keys for Amazon Comprehend.

Model encryption with a CMK

Along with encrypting your training data, you can now encrypt your custom models in Amazon Comprehend using a CMK. In this section, we go into more detail about this feature.

Prerequisites

You need to add an IAM policy to allow a principal to use or manage CMKs. CMKs are specified in the Resource element of the policy statement. When writing your policy statements, it’s a best practice to limit CMKs to those that the principals need to use, rather than give the principals access to all CMKs.

In the following example, we use an AWS KMS key (1234abcd-12ab-34cd-56ef-1234567890ab) to encrypt an Amazon Comprehend custom model.

When you use AWS KMS encryption, kms:CreateGrant and kms:RetireGrant permissions are required for model encryption.

For example, the following IAM policy statement in your dataAccessRole provided to Amazon Comprehend allows the principal to call the create operations only on the CMKs listed in the Resource element of the policy statement:

{"Version": "2012-10-17",
  "Statement": {"Effect": "Allow",
    "Action": [
      "kms:CreateGrant",
      "kms:RetireGrant",
      "kms:GenerateDataKey",
      "kms:Decrypt"
    ],
    "Resource": [
      "arn:aws:kms:us-west-2:[your account number]:key/1234abcd-12ab-34cd-56ef-1234567890ab"
    ]
  }
}

Specifying CMKs by key ARN, which is a best practice, makes sure that the permissions are limited only to the specified CMKs.

Enable model encryption

As of this writing, custom model encryption is available only via the AWS Command Line Interface (AWS CLI). The following example creates a custom classifier with model encryption:

 aws comprehend create-document-classifier \
--document-classifier-name my-document-classifier \ 
--data-access-role-arn arn:aws:iam::[your account number]:role/mydataaccessrole \
--language-code en  --region us-west-2 \
--model-kms-key-id arn:aws:kms:us-west-2:[your account number]:key/[your key Id] \
--input-data-config S3Uri=s3://path-to-data/multiclass_train.csv

The next example trains a custom entity recognizer with model encryption:

aws comprehend create-entity-recognizer \
--recognizer-name my-entity-recognizer \
--data-access-role-arn arn:aws:iam::[your account number]:role/mydataaccessrole \ 
--language-code "en" --region us-west-2 \
--input-data-config '{
      "EntityTypes": [{"Type": "PERSON"}, {"Type": "LOCATION"}],
      "Documents": {
            "S3Uri": "s3://path-to-data/documents"
      },
      "Annotations": {
          "S3Uri": "s3://path-to-data/annotations"
      }
}'

Finally, you can also create an endpoint for your custom model with encryption enabled:

aws comprehend create-endpoint \
 --endpoint-name myendpoint \
 --model-arn arn:aws:comprehend:us-west-2:[your account number]:document-classifier/my-document-classifier \
 --data-access-role-arn arn:aws:iam::[your account number]:role/mydataaccessrole \
 --desired-inference-units 1 --region us-west-2

Conclusion

You can now enforce security settings like enabling encryption and VPC settings for your Amazon Comprehend jobs using IAM condition keys. The IAM condition keys are available in all AWS Regions where Amazon Comprehend is available. You can also encrypt the Amazon Comprehend custom models using customer managed keys.

To learn more about the new condition keys and view policy examples, see Using IAM condition keys for VPC settings and Resource and Conditions for Amazon Comprehend APIs. To learn more about using IAM condition keys, see IAM JSON policy elements: Condition.


About the Authors

Sam Palani is an AI/ML Specialist Solutions Architect at AWS. He enjoys working with customers to help them architect machine learning solutions at scale. When not helping customers, he enjoys reading and exploring the outdoors.

 

 

Shanthan Kesharaju is a Senior Architect in the AWS ProServe team. He helps our customers with AI/ML strategy, architecture, and developing products with a purpose. Shanthan has an MBA in Marketing from Duke University and an MS in Management Information Systems from Oklahoma State University.