AWS for Industries

FSI Service Spotlight: Featuring Amazon Comprehend

Editor’s note: This is the second in a monthly series for Financial Services Industry Service Spotlight. Read Part 1 discussing Amazon SageMaker Notebook Instances.

In this edition of Financial Services Industry (FSI) Service Spotlight monthly blog series, we highlight five key considerations of a particular service that FSI customers should focus on to help streamline cloud service approval. Each of the five areas will include specific guidance, suggested reference architectures, and technical code that can help streamline service approval for the featured service, which may need to be adapted to your specific use case and environment.

For this edition of the Service Spotlight, we are covering Amazon Comprehend (excluding Comprehend Medical) because of the significant growth among FSI customers. Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. Amazon Comprehend processes any text file in UTF-8 format and develops insights by recognizing entities, key phrases, language, sentiments, and other common elements in documents.

A common use case for Amazon Comprehend for financial institutions is to analyze call transcriptions from their call centers to gather insights into their customer calls. This allows a financial institution to uncover common trends, personalize messages and offers, and ensure call center staff have all the available information on a given client to provide the best experience.

Another FSI use case is extracting data from proxy voting forms, as in the Broadridge example. By leveraging Amazon Comprehend with other AWS AI services, Broadridge achieved a 40% reduction in manual effort of reviewing and analyzing proxy voting forms. Broadridge extracts data points from SEC filings, including the board of directors, their length of tenure, and ESG proposals. The company then uses these data points to build a custom machine learning model on Amazon SageMaker to predict potentially contentious shareholder meetings, which will be a new product (and a new revenue stream) that it can offer to its asset management and broker-dealer customers.

Lastly, FINRA is leveraging Amazon Comprehend to process and review millions of documents with unstructured data. In this process, FINRA removed the manual review process and leverages Amazon Comprehend to automatically evaluate and flag documents that should be further examined by human investigators.

Achieving Compliance with Amazon Comprehend

Security is a shared responsibility between AWS and you. AWS is responsible for protecting the infrastructure that runs AWS services in the AWS Cloud and also provides you with services that you can use securely. Your responsibility is determined by the AWS service that you use. On the customer’s side of the shared responsibility model, customers should first determine their requirements for network connectivity, encryption, and access to other AWS resources. We will dive deeper into those topics in the upcoming sections.

Amazon Comprehend falls under the scope of the following compliance programs with regards to AWS’s side of the shared responsibility mode. In following sections, we will cover topics on the customer side of the shared responsibility model.

  • SOC 1,2,3
  • PCI
  • IRAP Protected
  • ISO/IEC 27001:2013, 27017:2015, 27018:2019, and ISO/IEC 9001:2015
  • OSPAR
  • C5
  • MTCS

Data Protection with Amazon Comprehend

Encryption is a commonly used mechanism to protect data in transit and rest. In transit when accessing Amazon Comprehend through the network, customers use AWS published API calls using clients that must support Transport Layer Security (TLS) 1.0 or later. However, we recommend TLS 1.2 or later.

Comprehend works with Amazon Key Management Service (KMS) to provide mechanisms to encrypt customer data at rest whilst being stored on the attached volumes in the Amazon Comprehend service account and to encrypt the output data in customer managed Amazon S3 buckets. Since input data will be stored in Amazon S3, you can leverage the options for encryption at rest offered by S3.

As a preventive mechanism to ensure encryption when certain Amazon Comprehend actions are being performed, customers can use service control policies (SCPs) and Amazon Comprehend’s IAM conditions keys.
See the following example SCP that denies the request unless the principal invoking the specified Amazon Comprehend actions provides the specified key for volume encryption.

SCPs are a type of organization policy that you can use to manage permissions in your organization. SCPs offer central control over the maximum available permissions for all accounts in your organization. SCPs help you to ensure your accounts stay within your organization’s access control guidelines.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Deny-Comprehend-Actions-without-volume-encryption-key",
            "Effect": "Deny",
            "Action": [
                "comprehend:StartEntitiesDetectionJob",
                "comprehend:CreateDocumentClassifier",
                "comprehend:StartDocumentClassificationJob",
                "comprehend:StartSentimentDetectionJob",
                "comprehend:StartKeyPhrasesDetectionJob",
                "comprehend:StartTopicsDetectionJob",
                "comprehend:StartDominantLanguageDetectionJob",
                "comprehend:CreateEntityRecognizer"
            ],
            "Resource": "*",
            "Condition": {
                "StringNotLike": {
                    "comprehend:VolumeKmsKey": "arn:aws:kms:us-east-2:0000000000000:key/a1b2c3d4-a1b2-1234-a1b2-a1b2c3d4e5f6"
                }
            }
        }
    ]
}

Isolation of Compute Environments with Amazon Comprehend

Amazon Comprehend is a managed service that doesn’t have any compute resources in the customer’s side of the shared responsibility model. As a managed service, Amazon Comprehend is protected by the AWS global network security procedures that are described in the Introduction to AWS Security whitepaper.

Customers can also establish a private connection between their VPC and Amazon Comprehend by creating an interface VPC endpoint. Interface endpoints are powered by AWS PrivateLink, a technology that enables you to privately access Amazon Comprehend APIs without an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. Using an interface VPC endpoint, instances in your VPC don’t need public IP addresses to communicate with Amazon Comprehend APIs. The use of interface VPC endpoints also ensure that traffic between your VPC and Amazon Comprehend does not leave the Amazon network.

When using interface endpoint, customers attach an endpoint policy to it, this policy controls access to the service to which you are connecting. The following is an example of an endpoint policy for Amazon Comprehend. When attached to an VPC endpoint, this policy grants access to the specified Amazon Comprehend actions for all principals on all resources. Customers can also use the comprehend:VpcSubnets and comprehend:VpcSecurityGroupIds IAM condition keys in their IAM policies to require your specific VPC configuration when principals create a job, by specifying the subnets and security groups. When you specify the subnets and security groups, Amazon Comprehend creates elastic network interfaces (ENIs) that are associated with your security groups in one of the subnets. These ENIs allow AWS Comprehend’s job containers to connect to resources in your VPC.

{
   "Statement":[
      {
         "Principal":"*",
         "Effect":"Allow",
         "Action":[
            "comprehend:DetectEntities",
            "comprehend:StartEntitiesDetectionJob",
            "comprehend:CreateDocumentClassifier",
            "comprehend:StartDocumentClassificationJob",
            "comprehend:StartSentimentDetectionJob",
            "comprehend:StartKeyPhrasesDetectionJob",
            "comprehend:StartTopicsDetectionJob",
            "comprehend:StartDominantLanguageDetectionJob",
            "comprehend:CreateEntityRecognizer"
         ],
         "Resource":"*"
      }
   ]
}

Similarly, by using resource-based policies, customers can restrict access to their resources to only allow access from VPC endpoints. For example, by using S3 bucket policies, customers can restrict access to a given Amazon S3 bucket only through the endpoint. This ensures that traffic remains private and only flows through the endpoint. The following is an example of a policy that allows VPC vpc-111bbb22 to access my_secure_bucket and its objects. The policy denies all access to the bucket if the specified VPC is not being used. The aws:sourceVpc condition does not require an Amazon Resource Name (ARN) for the VPC resource, only the VPC ID.

{
  "Version": "2012-10-17",
  "Id": "Policy1415115909152",
  "Statement": [
    {
      "Sid": "Access-to-specific-VPC-only",
      "Principal": "*",
      "Action": "s3:*",
      "Effect": "Deny",
      "Resource": ["arn:aws:s3:::my_secure_bucket",
                   "arn:aws:s3:::my_secure_bucket/*"],
      "Condition": {
        "StringNotEquals": {
          "aws:sourceVpc": "vpc-111bbb22"
        }
      }
    }
  ]
}

Automating Audits with APIs with Amazon Comprehend

AWS Config monitors the configuration of resources and provides some out of the box rules to alert when resources fall into a non-compliant state. For the input and destination S3 Buckets, you can use the S3 managed config rules to monitor compliance to encryption standards.

 

API calls made to Amazon Comprehend are recorded in AWS CloudTrail. AWS CloudTrail provides an aggregated repository of AWS API calls and changes to resources for over 160 AWS services. There are two types of methods to run jobs with Amazon Comprehend, synchronous and asynchronous. In Amazon Comprehend, there are a few key APIs that should be monitored to ensure only approved jobs are being executed in an asynchronous manner:

For all APIs that can start or create an Amazon Comprehend job, there is an InputDataConfig and OutputDataConfig Parameter. You’ll want to ensure that approved S3 buckets are configured for both and the correct KmsKeyId is configured for OutputDataConfig. For jobs that utilize a storage volume for job processing you’ll want to ensure correct encryption is configured via the VolumeKmsKeyId parameter of the API call. Every Comprehend job will require a DataAccessRoleArn, which you can monitor CloudTrail for the API call to to ensure least privileged access to the data.

Amazon Comprehend supports synchronous jobs with the following APIs:

For these APIs, Amazon Comprehend is processing the data in real time and returning a result, thus they do not have the same encryption options available during synchronous processing.

For a complete list of Amazon Comprehend APIs, not only those related to job execution, review the Comprehend API Reference.

Here is an example of what a CloudTrail log looks like for the StartKeyPhrasesDetectionJob API:

{"eventVersion": "1.05",
    "userIdentity": {"type": "AssumedRole”,`
        "principalId": "XXXXXX”,
        "arn": "arn:aws:sts::12345678910:assumed-role/UserRole/John”,
        "accountId": "12345678910",
        "accessKeyId": "ASIA3VZEXAMPLE”,
        "sessionContext": {"sessionIssuer": {"type": "Role",
                "principalId": "AROAICFHPEXAMPLE",
                "arn": "arn:aws:iam::12345678910:role/UserRole",
                "accountId": "12345678910",
                "userName": "UserRole"
            },
            "webIdFederationData": {},
            "attributes": {"mfaAuthenticated": "false",
                "creationDate": "2020-04-29T15:46:04Z"
            }
        }
    },
    "eventTime": "2020-04-29T16:42:39Z",
    "eventSource": "comprehend.amazonaws.com",
    "eventName": "StartKeyPhrasesDetectionJob",
    "awsRegion": "
us-east-2",
    "sourceIPAddress": "3.20.236.234",
    "userAgent": "aws-internal/3 aws-sdk-java/1.11.761 Linux/4.14.165-102.205.amzn2.x86_64 OpenJDK_64-Bit_Server_VM/25.201-b09 java/1.8.0_201 vendor/Oracle_Corporation, canary-generated exec-env/AWS_Lambda_java8",
    "requestParameters": {"inputDataConfig": {"s3Uri": "s3://dataset-prod-
us-east-2/ONE_DOC_PER_LINE/KP/FOUR_KB",
            "inputFormat": "ONE_DOC_PER_LINE"
        },
        "outputDataConfig": {"s3Uri": "s3://datasets3bucke-y6icltvagurj/JSON/abcd1234-6d84-4b09-bea3-abcdef123456",
            "kmsKeyId": "arn:aws:kms:
us-east-2:12345678910:key/abcd1234-6ab3-4bab-90ab-abcdef123456"
        },
        "dataAccessRoleArn": "arn:aws:iam::12345678910:role/DataAccessRoleXYZ",
        "languageCode": "en",
        "clientRequestToken": "d9e4f777-8b4e-4249-8ebb-b7495acb800b",
        "volumeKmsKeyId": "arn:aws:kms:
us-east-2:12345678910:key/abcd1234-c459-4d20-bb39-abcdef123456"
    },
    "responseElements": {"jobId": "2102905798f0d44a04da91fbd12ed60e",
        "jobStatus": "SUBMITTED"
    },
    "requestID": "f202914c-f5cf-4f2d-a4c7-5f30d101fe19",
    "eventID": "692e3a17-40a2-491a-a4ce-a275c4c4ce76",
    "readOnly": false,
    "resources": [
        {"accountId": "12345678910",
            "type": "AWS::KMS::Key",
            "ARN": "arn:aws:kms:
us-east-2:12345678910:key/abcd1234-6ab3-4bab-90ab-abcdef123456"
        },
        {"accountId": "12345678910",
            "type": "AWS::KMS::Key",
            "ARN": " arn:aws:kms:
us-east-2:12345678910:key/abcd1234-6ab3-4bab-90ab-abcdef123456"
        },
        {"accountId": "12345678910",
            "type": "AWS::S3::Object",
            "ARN": "s3://mydata-dev-
us-east-2/ONE_DOC_PER_LINE/KP/FOUR_KB"
        },
        {"accountId": "12345678910",
            "type": "AWS::S3::Object",
            "ARN": "s3://mybucket/JSON/00f706b3-6d84-4b09-bea3-88084f9cb6b0"
        },
        {"accountId": "12345678910",
            "type": "AWS::IAM::Role",
            "ARN": "arn:aws:iam::12345678910:role/DataAccessRoleXYZ"
        }
    ],
    "eventType": "AwsApiCall",
    "recipientAccountId": "12345678910"
}

Operational Access and Security with Amazon Comprehend

AWS customers in the financial services industry may require visibility to any access to their data stored on AWS. Customers can review third-party auditor reports such as  the AWS SOC 2 Type II report, ISO 27001, and others in AWS Artifact.

AWS FSI customers that are already using or planning to use certain AWS artificial intelligence (AI) services can opt-out of having their content stored or used for service improvements for Amazon Comprehend. Customers can configure organization-wide opt out policies that enforces their AI opt out setting choice on all accounts that are members of the organization.

The following example shows a policy that you could attach to your organization’s root to opt out of AI services for accounts in your organization, and prevent child accounts from changing the opt out policy.

{
    "services": {
        "@@operators_allowed_for_child_policies": ["@@none"],
        "default": {
            "@@operators_allowed_for_child_policies": ["@@none"],
            "opt_out_policy": {
                "@@operators_allowed_for_child_policies": ["@@none"],
                "@@assign": "optOut"
            }
        }
    }
}

Conclusion

In this post, we reviewed Amazon Comprehend and highlighted key information that can help FSI customers accelerate the approval of the service within these five categories: achieving compliance, data protection, isolation of compute environments, automating audits with APIs, and operational access and security. While not a one-size-fits-all approach, the above guidance can be adapted to meet your organization’s security and compliance requirements and provide a consolidated list of key areas for Amazon Comprehend.

In the meantime, be sure to visit our AWS Industries blog channel and stay tuned for more financial services news and best practices.

John Formento

John Formento

John is a Solutions Architect at Amazon Web Services. He helps large enterprises achieve their goals by architecting secure and scalable solutions on the AWS Cloud. John holds 6 AWS certifications including AWS Certified Solutions Architect – Professional and DevOps Engineer – Professional.

Syed Shareef

Syed Shareef

Syed is a Senior Security Solutions Architect based in Charlotte. He works with large financial institutions to help them achieve their business goals in AWS, in alignment with their risk appetite.