Announcing the launch of Amazon Comprehend custom entity recognition real-time endpoints

Update Sep 28, 2020 – New features: Amazon Comprehend custom entity recognition real-time endpoints now supports application auto scaling. Please refer to the section Auto Scaling with real-time endpoints in this post to learn more.

Update Aug 12, 2020 – New features: Amazon Comprehend adds five new languages(Spanish, French, German, Italian and Portuguese) read here.
Amazon Comprehend increased the limit of number of entities per custom entity model from 12 to 25 read here.

Amazon Comprehend is a natural language processing (NLP) service that can extract key phrases, places, names, organizations, events, sentiment from unstructured text, and more (for more information, see Detect Entities). But what if you want to add entity types unique to your business, like proprietary part codes or industry-specific terms? In November 2018, Amazon Comprehend added the ability to extend the default entity types to detect custom entities.

Until now, inference with a custom entity recognition model was an asynchronous operation.

In this post, we cover how to build an Amazon Comprehend custom entity recognition model and set up an Amazon Comprehend Custom Entity Recognition real time endpoint for synchronous inference. The following diagram illustrates this architecture.

Solution overview

Amazon Comprehend Custom helps you meet your specific needs without requiring machine learning (ML) knowledge. Amazon Comprehend Custom uses automatic ML (AutoML) to build customized NLP models on your behalf, using data you already have.

For example, if you’re looking at chat messages or IT tickets, you might want to know if they’re related to an AWS offering. You need to build a custom entity recognizer that can identify a word or a group of words as a SERVICE or VERSION entity from the input messages.

In this post, we walk you through the following steps to implement a solution for this use case:

Create a custom entity recognizer trained on annotated labels to identify custom entities such as SERVICE or VERSION.
Create a real-time analysis Amazon Comprehend custom entity recognizer endpoint to identify the chat messages to detect a SERVICE or VERSION entity.
Calculate the inference capacity and pricing for your endpoint.

We provide a sample dataset aws-service-offerings.txt. The following screenshot shows example entries from the dataset.

You can provide labels for training a custom entity recognizer in two different ways: entity lists and annotations. We recommend annotations over entity lists because the increased context of the annotations can often improve your metrics. For more information, see Improving Custom Entity Recognizer Performance. We preprocessed the input dataset to generate training data and annotations required for training the custom entity recognizer.

You can download these files below:

train.csv – Contains a list of messages for training the recognizer
annotations.csv – We created the annotations file as shown in the following screenshot using Amazon SageMaker Ground Truth named entity recognition

After you download these files, upload them to an Amazon Simple Storage Service (Amazon S3) bucket in your account for reference during training. For more information about uploading files, see How do I upload files and folders to an S3 bucket?
For more information about creating annotations or labels for your custom dataset, see Developing NER models with Amazon SageMaker Ground Truth and Amazon Comprehend.

Creating a custom entity recognizer

To create your recognizer, complete the following steps:

On the Amazon Comprehend console, create a custom entity recognizer.
Choose Train recognizer.
For Recognizer name, enter aws-offering-recognizer.
For Custom entity type, enter SERVICE.
Choose Add type.
Enter a second Custom entity type called VERSION.
For Training type, select Using annotations and training docs.
For Annotations location on S3, enter the path for annotations.csv in your S3 bucket.
For Training documents location on S3, enter the path for train.csv in your S3 bucket.
For IAM role, select Create an IAM role.
For Permissions to access, choose Input and output (if specified) S3 bucket.
For Name suffix, enter ComprehendCustomEntity.
Choose Train.

For our dataset, training should take approximately 10 minutes.

When the recognizer training is complete, you can review the training metrics in the Recognizer details section.

Scroll down to see the individual training performance.

For more information about understanding these metrics and improving recognizer performance, see Custom Entity Recognizer Metrics.

When training is complete, you can use the recognizer to detect custom entities in your documents. You can quickly analyze single documents up to 5 KB in real time, or analyze a large set of documents with an asynchronous job (using Amazon Comprehend batch processing).

Creating a custom entity endpoint

Creating your endpoint is a two-step process: building an endpoint and then using it by running a real-time analysis.

Building the endpoint

To create your endpoint, complete the following steps:

On the Amazon Comprehend console, choose Customization.
Choose Custom entity recognition.
From the Recognizers list, choose the name of the custom model for which you want to create the endpoint and follow the link. The endpoints list on the custom model details page is displayed. You can also see previously created endpoints and the models they’re associated with.
Select your model.
From the Actions drop-down menu, choose Create endpoint.
For Endpoint name, enter DetectEntityServiceOrVersion.

The name must be unique within the AWS Region and account. Endpoint names have to be unique even across recognizers.

For Inference units, enter the number of inference units (IUs) to assign to the endpoint.

We discuss how to determine how many IUs you need later in this post.

As an optional step, under Tags, enter a key-value pair as a tag.
Choose Create endpoint.

The Endpoints list is displayed, with the new endpoint showing as Creating. When it shows as Ready, you can use the endpoint for real-time analysis.

Running real-time analysis

After you create the endpoint, you can run real-time analysis using your custom model.

For Analysis type, select Custom.
For Endpoint, choose the endpoint you created.

For Input text, enter the following:

AWS Deep Learning AMI (Amazon Linux 2) Version 220 The AWS Deep Learning AMIs are prebuilt with CUDA 8 and several deep learning frameworks.The DLAMI uses the Anaconda Platform with both Python2 and Python3 to easily switch between frameworks.

Choose Analyze.

You get insights as in the following screenshot, with entities recognized as either SERVICE or VERSION and their confidence score.

You can experiment with different input text combinations to compare and contrast the results.

Determining the number of IUs you need

The number of IUs you need depends on the number of characters you send in your request and the throughput you need from Amazon Comprehend. In this section, we discuss two different use cases with different costs.

In all cases, endpoints are billed in 1-second increments, with a minimum of 60 seconds. Charges continue to incur from the time you provision your endpoint until it’s deleted, even if no documents are analyzed. For more information, see Amazon Comprehend Pricing.

Use case 1

In this use case, you receive 10 messages/feeds every minute, and each message is comprised of 360 characters that you need to recognize entities for. This equates to the following:

60 characters per second (360 characters x 10 messages ÷ 60 seconds)
An endpoint with 1 IU provides a throughput of 100 characters per second

You need to provision an endpoint with 1 IU. Your recognition model has the following pricing details:

The price for 1 IU is $0.0005 per second
You incur costs from the time you provision your endpoint until it’s deleted, regardless of how many inference calls are made
If you’re running your real-time endpoint for 12 hours a day, this equates to a total cost of $21.60 ($0.0005 x 3,600 seconds x 12 hours) for inference
The model training and model management costs are the same as for asynchronous entity recognition at $3.00 and $0.50, respectively

The total cost of an hour of model training, a month of model management, and inference using a real-time entity recognition endpoint for 12 hours a day is $25.10 per day.

Use case 2

In this second use case, your requirement increased to run inference for 50 messages/feeds every minute, and each message contains 600 characters that you need to recognize entities for. This equates to the following:

500 characters per second (600 characters x 50 messages ÷ 60 seconds)
An endpoint with 1 IU provides a throughput of 100 characters per second.

You need to provision an endpoint with 5 IU. Your model has the following pricing details:

The price for 1 IU the $0.0005 per second
You incur costs from the time you provision your endpoint until it’s deleted, regardless of how many inference calls are made
If you’re running your real-time endpoint for 12 hours a day, this equates to a total cost of $108 (5 x $0.0005 x 3,600 seconds x 12 hours) for inference
The model training and model management costs are the same as for asynchronous entity recognition at $3.00 and $0.50, respectively

The total cost of an hour of model training, a month of model management, and inference using a real-time entity recognition endpoint with a throughput of 5 IUs for 12 hours a day is $111.50.

Auto Scaling with real-time endpoints

Auto Scaling helps you make sure that you have the correct number of inference units (throughput capacity) available to handle the load for your Amazon Comprehend real-time endpoints. You can set up Auto Scaling two different ways:

Target tracking – Compute capacity required to host the endpoint (IUs) is determined and scaled based on utilization
Scheduled scaling – Compute capacity required to host the endpoint (IUs) is applied based on a schedule

As of this writing, you can only set up Auto Scaling for Amazon Comprehend real-time endpoints using the AWS Command Line Interface (AWS CLI). For more information about configuring the AWS CLI, see Configuration basics.

Auto Scaling using target tracking

With target tracking scaling policies, you select a scaling metric and set a target value. The scaling policy adds or removes capacity as required to keep the metric at, or close to, the specified target value. In addition to keeping the metric close to the target value, a target tracking scaling policy adjusts to changes in the metric due to a changing load pattern. In our example, the provisioned capacity is automatically adjusted so that the utilized capacity is always 70% of the provisioned capacity. This approach enables capacity for temporary usage spikes. For more information, see Target tracking scaling policies for Application Auto Scaling.

To set up target tracking, complete the following steps:

Copy the endpoint ARN of the custom entity endpoint you created earlier.

You use the RegisterScalableTarget API to register Amazon Comprehend as a scalable target for Auto Scaling.

In terminal or the command prompt, enter the following code (modify the Region to us-east-1 and replace resource-id with your endpoint ARN):

aws application-autoscaling register-scalable-target \
--service-namespace comprehend \
--region <region> \
--resource-id <endpoint ARN> \
--scalable-dimension comprehend:entity-recognizer-endpoint:DesiredInferenceUnits \
--min-capacity 1 \
--max-capacity 2

You use the DescribeScalableTargets API to describe the scalable target to confirm that the endpoint is registered as the scalable target.

In terminal or the command prompt, enter the following code (modify the Region to us-east-1 and replace resource-id with your endpoint ARN):

aws application-autoscaling describe-scalable-targets \
--service-namespace comprehend \
--region <region> \
--resource-id <endpoint ARN>

The following screenshot shows the response you get after entering this command.

You use PutScalingPolicy to create a scaling policy to dynamically change the provisioned capacity based on the consumed capacity. For this post, the provisioned capacity is adjusted such that consumed capacity is always 70% of provisioned capacity. The scaling policy is stored in a text file called config.json, which contains the following code:

{
"TargetValue": 70,
"PredefinedMetricSpecification": 
{
"PredefinedMetricType": "ComprehendInferenceUtilization"
}
}

Download the json file to your local computer.
In terminal or the command prompt, enter the following code (modify the Region to us-east-1 and replace resource-id with your endpoint ARN):

aws application-autoscaling put-scaling-policy \
--service-namespace comprehend \
--region <region> \
--scalable-dimension comprehend:document-classifier-endpoint:DesiredInferenceUnits \
--resource-id <endpoint ARN> \
--policy-name TestPolicy \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration file://config.json

The following screenshot shows the response you get.

For more information and instructions on removing target tracking, see Target Tracking.

Auto Scaling using scheduled scaling

Scheduled scaling allows you to set your own scaling schedule. For example, let’s say that every week the traffic to your Amazon Comprehend custom entity recognition real-time endpoint increases on Wednesday, remains high on Thursday, and decreases on Friday. You can plan your scaling actions based on the predictable traffic patterns. Scaling actions are performed automatically as a function of time and date.

You use RegisterScalableTarget to register Amazon Comprehend as a scalable target for Auto Scaling.

You use PutScheduledAction to create a scheduled action, which controls the minimum and maximum provisioned capacity, within which the capacity can be scaled at a specific schedule. For more information on chronological expressions and scheduled scaling, see Schedule Expressions for Rules.

For this post, we created a script (endpoint_scaling.sh) to automate these steps.

With this script, you can scale down on weekends and scale up during weekdays when you have the most traffic on your endpoints. You can set up more IUs during weekdays when you need maximum throughput and scale it down to less when you don’t need that capacity during weekends.

The minimum IU for an endpoint is 1 and can’t be set 0.

To set up scheduled scaling, complete the following steps:

Open the provided script in a text editor and provide following values:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_DEFAULT_REGION
Save this script.
Go to terminal or the command prompt and run the following command to invoke the script:

chmod a+x endpoint_scaling.sh
sh endpoint_scaling.sh <your-endpoint arn> 2 1

The following screenshot shows the response you get.

Verify the scheduling run with the following code:

aws application-autoscaling describe-scheduled-actions —service-namespace comprehend

You should find the two schedules set up as ScheduledActions for weekend and weekday scaling (see the following screenshot).

Validating scaling up during weekdays

To verify if your endpoints scaled, go back to the Amazon Comprehend console and check the status of endpoint. We created this endpoint using 1 IU. The following screenshot shows that it scaled on Friday to 2 IU.

The following screenshot shows the scaled down to 1 IU during the weekend.

For more information and instructions on removing a scheduled scaling action, see Scheduled Scaling.

Cleaning up

To avoid incurring future charges, stop or delete resources (the endpoint, recognizer, and any artifacts in Amazon S3) when not in use.

To delete your endpoint, on the Amazon Comprehend console, choose the entity recognizer you created. In the Endpoints section, choose Delete.

To delete your recognizer, in the Recognizer details section, choose Delete.

For instructions on deleting your S3 bucket, see Deleting or emptying a bucket.

Conclusion

This post demonstrated how easy it is to set up an endpoint for real-time text analysis to detect custom entities that you trained your Amazon Comprehend custom entity recognizer on. Custom entity recognition extends the capability of Amazon Comprehend by enabling you to identify new entity types not supported as one of the preset generic entity types. With Amazon Comprehend custom entity endpoints, you can now easily derive real-time insights on your custom entity detection models, providing a low latency experience for your applications. We’re interested to hear how you would like to apply this new feature to your use cases. Please share your thoughts and questions in the comments section.

About the Authors

Mona Mona is an AI/ML Specialist Solutions Architect based out of Arlington, VA. She works with the World Wide Public Sector team and helps customers adopt machine learning on a large scale. She is passionate about NLP and ML explainability areas in AI/ML.

Prem Ranga is an Enterprise Solutions Architect based out of Houston, Texas. He is part of the Machine Learning Technical Field Community and loves working with customers on their ML and AI journey. Prem is passionate about robotics, is an autonomous vehicles researcher, and also built the Alexa-controlled Beer Pours in Houston and other locations.

Artificial Intelligence