AWS for Industries

FSI Service Spotlight: Featuring Amazon Transcribe

Editor’s note: This is the fourth in a monthly series for Financial Services Industry Service Spotlight.

Welcome to the Service Spotlight blog series. In this series, we plan to highlight five key considerations of a particular service that financial institutions should focus on to help streamline service approval. Each of the five areas will include specific guidance that can help streamline service approval for the particular service, with the caveat that there may be some modification needed to suit specific use cases and environments.

For this edition of the Service Spotlight, we are covering Amazon Transcribe (excluding Amazon Transcribe Medical). Amazon Transcribe uses a deep learning process called automatic speech recognition (ASR) to recognize speech in audio files and transcribe them into text. Customers can use Amazon Transcribe to convert speech to text and to create applications that incorporate the content of audio files.

Amazon Transcribe

Intuit is a financial software company and an AWS customer that utilized Amazon Connect for its call centers. With more than 275 million minutes of customer interactions each year, Intuit uses Contact Lens for Amazon Connect, which provides contact center analytics powered by machine learning, to provide accurate call transcriptions, redaction of sensitive data, and automated call metrics to determine the effectiveness of its contact center. Contact Lens for Amazon Connect builds on Amazon Transcribe to generate call transcripts.

Northwestern Mutual, a financial services mutual organization, used Amazon Transcribe in its application for financial representatives to generate compliant case notes for each of their clients. Amazon Transcribe provided an accurate voice-to-text technology, and the custom vocabulary feature allowed the company to add unique industry or company terms and acronyms. Northwestern Mutual’s tool allowed representatives to streamline the process for creating complete case notes, while meeting compliance requirements.

Achieving Compliance with Amazon Transcribe

Security and compliance are a shared responsibility between AWS and the customer.  AWS will operate, manage, and protect the infrastructure that runs the AWS services. The customer’s responsibility is determined by the service selected; the more managed services involve less customer configuration. As Amazon Transcribe is an abstracted service, customers are responsible for fewer controls to deploy secure transcription jobs vis-a-vis an Infrastructure service.

points at which customers take on responsibility from AWS

The preceding image shows the points at which customers take on responsibility from AWS depending on the chosen service. The left-most stack encapsulates services such as Amazon EC2, and allows customers to implement the most amount of customization in their environments. Naturally, this goes hand-in-hand with the customer having the responsibility of managing more components. Amazon Transcribe falls in the right-most stack of abstracted services. AWS increased its portion of the shared responsibility when it comes to server-side encryption, platform, and operating system management. The customer receives out-of-the-box Amazon Transcribe capabilities, and therefore only needs to configure client-side encryption and customer data protection, and Identity and Access Management amongst other any additional security controls.

Amazon Transcribe is included in the scope of multiple compliance programs, meaning that the customer inherits the controls that AWS has certified through the following compliance programs:

  • SOC 1,2,3
  • PCI
  • FedRAMP (Moderate and High)
  • DoD CC SRG (IL2-IL5)
  • HIPAA
  • IRAP
  • ISO/IEC 27001:2013, 27017:2015, 27018:2019, and ISO/IEC 9001:2015
  • MTCS (Regions: US-East, US-West, Singapore, Seoul)
  • K-ISMS
  • ENS High
  • OSPAR
  • HITRUST CSF
  • FINMA

In the following sections, we will cover what customers must do in respect to the shared responsibility model in order to help achieve their security and compliance objectives.

Data Protection with Amazon Transcribe

Encryption is a commonly used mechanism to protect data in transit and rest. When accessing Amazon Transcribe through the network, customers use AWS published API calls using clients that must support Transport Layer Security (TLS) 1.0 or later. However, we recommend using at minimum TLS version of 1.2. If you are using FIPS endpoints in supported Regions, note that AWS stopped supporting TLS 1.0 and TLS 1.1 on FIPS endpoints, across all AWS Regions on March 31, 2021.

AWS Key Management Service (AWS KMS) makes it easy for you to create and manage cryptographic keys and control their use across a wide range of AWS services and in your applications. AWS KMS is integrated with Amazon Transcribe to simplify using your keys to encrypt your transcription outputs.

By default, Amazon Transcribe stores outputs from transcription jobs in an Amazon Simple Storage Service (Amazon S3) bucket managed by AWS and returns a presigned URL to access the transcription. The data at rest is encrypted under an AWS KMS key managed by AWS. However, customers can optionally specify an output bucket and a KMS key to encrypt the transcription job output. We recommend this for financial services customers so they can control and audit access at both the bucket and level.

When server-side encryption with Amazon S3-managed keys (SSE-S3) is enabled for encryption of transcripts placed in your S3 bucket, Amazon Transcribe uses Amazon S3 to encrypt each object with a unique key (AES-256). As an additional safeguard, it encrypts the key itself with a key that rotates regularly. Amazon Transcribe will then validate that the specified S3 bucket exists and that it has permission to put encrypted objects into that bucket before completing the transcription.

The Amazon Transcribe input resides in S3, and we recommend customers use the encryption options in Amazon S3 to encrypt input audio when creating a transcription job. The Amazon Elastic Compute (Amazon EC2) instances that Amazon Transcribe uses to process the transcription jobs have Amazon Elastic Block Store (Amazon EBS) volumes, which are encrypted with the default key in the Amazon Transcribe service account.

Amazon Transcribe’s automatic content redaction feature automatically redacts sensitive personally identifiable information (PII) from the transcription results. When this feature is enabled for transcription jobs, each identified instance of PII is replaced with a [PII] tag in the transcript. Customers can use this feature for source audio in US English (en-US) with batch API calls.

Transcription jobs using automatic content redaction generate either one of the two types of confidence values, the Automatic Speech Recognition (ASR) confidence or PII confidence:

  • The ASR confidence indicates the accuracy of word transcribed by the service. In the following transcript output, the word “Good” has a confidence of 1.0. This confidence value indicates that Amazon Transcribe is 100% confident that the word being transcribed in this transcript is “Good.”
  • The PII confidence value for a PII tag is the confidence that the speech it flagged for redaction is truly PII. In the following transcript output, the confidence of 0.9999 indicates that Amazon Transcribe is 99.99% confident that the entity it redacted in the transcript is indeed PII.
{
    "jobName": "job id",
    "accountId": "account id",
    "isRedacted": true,
    "results": {
        "transcripts": [
            {
                "transcript": "Good morning, everybody. My name is [PII], and today I feel like sharing a whole lot of personal information with you. Let's start with my Social Security number [PII]. My credit card number is [PII] and my C V V code is [PII]. I hope that Amazon Transcribe is doing a good job at redacting that personal information away. Let's check."
            }
        ],
        "items": [
            {
                "start_time": "2.86",
                "end_time": "3.35",
                "alternatives": [
                    {
                        "confidence": "1.0",
                        "content": "Good"
                    }
                ],
                "type": "pronunciation"
            },
            Items removed for brevity
            {
                "start_time": "5.56",
                "end_time": "6.25",
                "alternatives": [
                    {
                        "content": "[PII]",
                        "redactions": [
                            {
                                "confidence": "0.9999"
                            }
                        ]
                    }
                ],
                "type": "pronunciation"
            },
            Items removed for brevity
        ],
    },
    "status": "COMPLETED"
}

As a preventive mechanism to ensure encryption when certain Amazon Transcribe actions are being invoked, customers can use service control policies (SCPs) and Amazon Transcribe’s IAM conditions keys.

Using IAM condition keys, customers can conditionally allow their Amazon Transcribe users to create transcription jobs only when the following conditions are met:

  1. An AWS KMS key is specified for encrypting transcription output,
  2. A specific bucket is used to store transcription output, and
  3. A specific bucket location (key) is used to store transcription output.

FSI customers that have a large number of accounts use AWS Organizations to centrally manage and govern the environment as the organization grows and scales AWS resources. Customers using Organizations can leverage SCPs, a type of organization policy that used to manage permissions in the organization. SCPs offer central control over the maximum available permissions for all accounts in the organization. SCPs help customers to ensure accounts stay within the organization’s access control guidelines.

See the following for an example SCP that denies the request unless the principal invoking the specified Amazon Transcribe actions provides the specified AWS KMS key for output encryption. Customers can choose to use wildcards in the condition section to allow for AWS KMS use meeting their requirements, or specify a condition that checks if the AWS KMS key is being used. See the Amazon Transcribe IAM documentation section for more details.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Deny",
            "Action": "transcribe:StartTranscriptionJob",
            "Resource": "*",
            "Condition": {
                "StringNotLike": {
                    "transcribe:OutputEncryptionKMSKeyId": "arn:aws:kms:us-east-2:1111222233334444:key/*"
                }
            }
        }
    ]
}

Isolation of Compute Environments with Amazon Transcribe

Amazon Transcribe is an abstracted service, so it doesn’t have any compute resources in the customer’s side of the shared responsibility model. As a managed service, Amazon Transcribe is protected by the AWS global network security procedures that are described in the Amazon Web Services: Overview of Security Processes whitepaper.

Customers can also establish a private connection between their VPC and Amazon Transcribe by creating an interface VPC endpoint. Interface endpoints are powered by AWS PrivateLink, a technology that enables you to privately access Amazon Transcribe APIs without an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. Instances in your VPC don’t need public IP addresses to communicate with Amazon Transcribe APIs. The use of interface VPC endpoints also ensures that traffic between your VPC and Amazon Transcribe does not leave the Amazon network.

Illustration of a customer app in the customer’s AWS account, using a Transcribe VPC endpoint

The preceding image shows an illustration of a customer app in the customer’s AWS account, using a Transcribe VPC endpoint to access Transcribe resources in the service account.

The following is an example of an endpoint policy for Amazon Transcribe. When attached to a VPC endpoint, this policy grants access to the specified Amazon Transcribe actions for all principals on all resources. Customers can choose to limit the Amazon Transcribe actions in this endpoint policy to fine tune access, and align with least privilege principle.

{
   "Statement":[
      {
         "Principal":"*",
         "Effect":"Allow",
         "Action": [
                "transcribe:UpdateVocabularyFilter",
                "transcribe:CreateVocabulary",
                "transcribe:CreateVocabularyFilter",
                "transcribe:DescribeLanguageModel",
                "transcribe:UpdateVocabulary",
                "transcribe:CreateMedicalVocabulary",
                "transcribe:ListLanguageModels",
                "transcribe:CreateLanguageModel",
                "transcribe:DeleteLanguageModel",
                "transcribe:ListTranscriptionJobs",
                "transcribe:GetVocabulary",
                "transcribe:GetTranscriptionJob",
                "transcribe:StartStreamTranscription",
                "transcribe:StartTranscriptionJob",
                "transcribe:DeleteVocabulary",
                "transcribe:StartStreamTranscriptionWebSocket",
                "transcribe:ListVocabularyFilters",
                "transcribe:GetVocabularyFilter",
                "transcribe:ListVocabularies",
                "transcribe:DeleteVocabularyFilter"
            ],
         "Resource":"*"
      }
   ]
}

Similarly, by using other resource-based policies, customers can restrict access to their resources to only allow access from VPC endpoints. For example, by using S3 bucket policies, customers can restrict access to a given Amazon S3 bucket only through the endpoint. This ensures that traffic remains private and only flows through the endpoint. The following is an example of a policy that restricts access to a specific bucket, “my-secure-bucket,” only from the VPC endpoint with the ID vpce-111bbb22. The policy denies all access to the bucket if the specified endpoint is not being used. The aws:SourceVpce condition does not require an Amazon Resource Name (ARN) for the VPC endpoint resource, only the VPC endpoint ID.

{
   "Version": "2012-10-17",
   "Id": "Policy11112222333344455",
   "Statement": [
     {
       "Sid": "VPC-Endpoint-Access-only",
       "Principal": "*",
       "Action": "s3:*",
       "Effect": "Deny",
       "Resource": ["arn:aws:s3:::my-secure-bucket",
                    "arn:aws:s3:::my-secure-bucket/*"],
       "Condition": {
         "StringNotEquals": {
           "aws:SourceVpce": "vpce-111bbb22"
         }
       }
     }
   ]
}

Automating Audits Using APIs with Amazon Transcribe

AWS Config monitors the configuration of resources and provides some managed rules to alert when resources fall into a non-compliant state. For the Amazon Transcribe input and destination S3 Buckets, we recommend customers use the S3 managed config rules to monitor conformance with S3 best practices.

Calls to AWS services are API-based, and can be logged in AWS CloudTrail. CloudTrail will contain the calls made to change or run supported AWS services. CloudTrail logs all Amazon Transcribe actions, which are documented in the Transcribe API Reference. For example, calls to the CreateVocabulary, GetTranscriptionJob, and StartTranscriptionJob operations generate entries in the CloudTrail log files. Every event or log entry contains information about who generated the request. The identity information helps users determine the following:

  • Whether the request was made with root or IAM user credentials
  • Whether the request was made with temporary security credentials for a role or federated user
  • Whether the request was made by another AWS service

Amazon Transcribe can perform streaming or batch jobs. The streaming capability allows users to transcribe a stream of audio in real-time using WebSockets or HTTP/2. The API call StartStreamTranscription will open a bidirectional HTTP2 stream so audio can flow between the customer application and Amazon Transcribe. As mentioned previously in the Data Protection section, Amazon Transcribe uses TLS 1.2 and AWS certificates for encryption in transit for live streaming and batch jobs.

Calling the StartTranscriptionJob API starts a batch job to transcribe the speech in an audio or video file to text, and will drop the output into the designated S3 bucket. The ListTranscriptionJobs API returns a list of transcription jobs that have been started. Users can specify the status of the jobs that they want the operation to return. For example, users can get a list of all pending jobs, or a list of completed jobs.

The following example is provided of a CloudTrail log collected when the ListTranscriptionJobs API call is made:

{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "XXXXX",
        "arn": "arn:aws:sts::12345678910:assumed-role/UserRole/xxxxxx",
        "accountId": "12345678910",
        "accessKeyId": "ASIAEXAMPLEKEY12",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "XXXXX",
                "arn": "arn:aws:iam::12345678910:role/UserRole",
                "accountId": "12345678910",
                "userName": "UserRole"
            },
            "webIdFederationData": {},
            "attributes": {
                "mfaAuthenticated": "false",
                "creationDate": "2021-01-27T20:06:11Z"
            }
        }
    },
    "eventTime": "2021-01-27T20:08:19Z",
    "eventSource": "transcribe.amazonaws.com",
    "eventName": "ListTranscriptionJobs",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "1.2.3.4",
    "userAgent": "aws-internal/3 aws-sdk-java/1.11.802 Linux/4.9.230-0.1.ac.223.84.332.metal1.x86_64 OpenJDK_64-Bit_Server_VM/25.252-b09 java/1.8.0_252 vendor/Oracle_Corporation",
    "requestParameters": {
        "maxResults": 100
    },
    "responseElements": null,
    "requestID": "abcdabcd-424d-4fd3-a863-abcdabcbdabcd",
    "eventID": "poiupoiu-c18c-4725-92e0-hjklhjklhjkl",
    "readOnly": true,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "eventCategory": "Management",
    "recipientAccountId": "12345678910"
}

Financial services customers can use AWS Audit Manager to continuously audit their AWS usage and simplify how they assess risk and compliance with regulations and industry standards. AWS Audit Manager automates evidence collection and organizes the evidence as defined by the control set in the framework selected such as PCI-DSS, SOC 2, and GDPR. Audit Manager collects data from sources including CloudTrail to compare the environment’s configurations against the compliance controls. By logging all Amazon Transcribe calls in CloudTrail, Audit Manager’s integration with CloudTrail becomes advantageous when needing to ensure that controls have been met. Considering the encryption requirement in SOC 2 for example, rather than querying across all CloudTrail logs to ensure the S3 bucket for Amazon Transcribe’s output is encrypted, customers can centrally collect evidence in Audit Manager to check whether the requirement is being met. Audit Manager saves time with automated collection of evidence and provides audit-ready reports for customers to review. The Audit Manager assessment report uses cryptographic verification to help you ensure the integrity of the assessment report.

The following screenshot illustrates the configuration of a customer control’s data source for the Amazon Transcribe action of interest.

configuration of a customer control’s data source for the Amazon Transcribe action of interest

See the following example of evidence in Audit Manager from a custom Amazon Transcribe control. The OutputEncryptionKMSKeyId field denotes that the transcribe request was made with a KMS key for encrypting the output being stored in S3.

{
  "transcriptionJobName": "test-with-enc-am",
  "languageCode": "en-US",
  "media": {
    "mediaFileUri": ""
  },
  "outputBucketName": "transcribe-output-bucket-123412341234",
  "outputEncryptionKMSKeyId": "arn:aws:kms:us-east-2:123412341234:key/a1b2bc3d4-da64-4725-9cc5-abc123abc123",
  "settings": {
    "showSpeakerLabels": false,
    "channelIdentification": false
  }
}

Operational Access and Security with Amazon Transcribe

AWS customers in the financial services industry may require visibility to any access to their data stored on AWS. Customers can review third-party auditor reports such as the AWS SOC 2 Type II report, ISO 27001, and others in AWS Artifact.

Amazon Transcribe stores and uses voice inputs that it has processed to develop the service and continuously improve your experience. Customers can opt out of having their content used to develop and improve Amazon Transcribe by using an AWS Organizations opt-out policy. Customers can configure organization-wide opt-out policies that enforce their AI opt-out setting choice on all accounts that are members of the organization. Customers that choose to enable the opt-out policy to prevent the use of their content for Amazon Transcribe quality improvements must specify an output Bucket location to store the output of the transcription when when creating an Amazon Transcribe job.

The following example shows a policy that you could attach to your organization’s root to opt out of AI services for accounts in your organization, and prevent child accounts from changing the opt-out policy.

{
    "services": {
        "@@operators_allowed_for_child_policies": ["@@none"],
        "default": {
            "@@operators_allowed_for_child_policies": ["@@none"],
            "opt_out_policy": {
                "@@operators_allowed_for_child_policies": ["@@none"],
                "@@assign": "optOut"
            }
        }
    }
}

Conclusion

In this post, we reviewed Amazon Transcribe and highlighted key information that can help financial services customers accelerate the approval of the service within these five categories: achieving compliance, data protection, isolation of compute environments, automating audits with APIs, and operational access and security. We encourage customers to use the controls highlighted in this blog as appropriate in accordance with their business needs and AWS environment. These controls highlighted here are not necessarily exhaustive, rather illustrative of the common controls based on our experience in the field helping customers use Amazon Transcribe and other AWS services.

In the meantime, be sure to visit our AWS for Industries blog channel and stay tuned for more financial services news and best practices.

Annam Iyer

Annam Iyer

Annam Iyer is a Solutions Architect at AWS. She enjoys problem solving with her customers and being their trusted advisor. In her free time, she likes to try new restaurants, travel, and watch sports.

Syed Shareef

Syed Shareef

Syed is a Senior Security Solutions Architect based in Charlotte. He works with large financial institutions to help them achieve their business goals in AWS, in alignment with their risk appetite.