Amazon Comprehend provides natural language processing, Personal Identifiable Information (PII) detection and redaction, Custom Classification and Entity detection, and topic modeling, enabling a broad range of applications that can analyze raw text, and with some APIs, document formats like PDF and Word.

  • Natural language processing: Amazon Comprehend APIs for entity recognition, sentiment analysis, syntax analysis, key phrase extraction, and language detection can be used to extract insights from natural language text. These requests are measured in units of 100 characters (1 unit = 100 characters), with a 3 unit (300 character) minimum charge per request.
  • Personal Identifiable Information (PII): The detect PII API finds locations of chosen Personally Identifiable Information (“PII”) entities inside a document and can be used to create redacted versions of documents. The contains PII API tells you if a document contains the chosen PII or not. These requests are also measured in units of 100 characters (1 unit = 100 characters), with a 3 unit (300 character) minimum charge per request.
  • Custom Comprehend: The Custom Classification and Entities APIs can train a custom NLP model to categorize text and extract custom entities. Asynchronous inference requests are measured in units of 100 characters, with a 3 unit (300 character) minimum charge per request. You are charged $3 per hour for model training (billed by the second) and $0.50 per month for custom model management. For synchronous Custom Classification and Entities inference requests, you provision an endpoint with the appropriate throughput. You are charged from the time that you start your endpoint until it is deleted.
  • Topic Modeling: Topic Modeling identifies relevant terms or topics from a collection of documents stored in Amazon S3. It will identify the most common topics in the collection and organize them in groups and then map which documents belong to which topic. You are charged based on the total size of documents processed per job. The first 100 MB is charged a flat rate. Above 100 MB, you are charged per MB.
  • Trust and Safety (new): Comprehend toxicity detection API can be used to detect toxic content from text. Similarly, Comprehend prompt safety classification feature can be used to detect unsafe input prompts to large language models and applications. These requests are measured in units of 100 characters (1 unit = 100 characters), with a 3 unit (300 character) minimum charge per request.
  • For Amazon Comprehend Medical pricing, learn more here.
  • You can estimate your costs using the AWS Pricing Calculator.
  • Select US East (N.Virginia) region in the region selector below to view pricing for all APIs

 

For volumes higher than 100M units per month, please contact us for pricing.
NLP requests are measured in units of 100 characters, with a 3 unit (300 character) minimum charge per request.

With Amazon Comprehend APIs, you can process both unstructured, raw text and, with some APIs, other text files like PDF and Word documents. 

Custom Comprehend

Custom Entities & Classification
For asynchronous entity recognition on PDF*, Word, and plain text documents

Inference requests are measured in units of 100 characters, with a 3 unit (300 character) minimum charge per request.

For asynchronous classification

Inference requests are measured in units of 100 characters, with a 3 unit (300 character) minimum charge per request.

For synchronous classification and entity recognition

Endpoints are billed on one second increments, with a minimum of 60 seconds. Charges will continue to incur from the time you start the endpoint until it is deleted even if no documents are analyzed.

One inference unit (IU) provides a throughput of 100 characters/second on your managed endpoint. You can provision additional IUs for more throughput. Each IU will incur $0.0005 per second.

$3 per Hour for Model Training

*to extract text from scanned PDF documents Amazon Textract Detect Document Text API is called.

Topic Modeling

For the first 100MB

For every MB above 100MB

You are charged based on the total size of documents processed per topic modeling job. The first 100 MB is charged a flat rate. Above 100 MB, you are charged per MB.

Free Tier

50K UNITS OF TEXT (5M CHARACTERS)

Amazon Comprehend offers a free tier covering 50K units of text (5M characters) per API per month.

Eligible APIs include Key Phrase Extraction, Sentiment, Targeted Sentiment, Entity Recognition, Language Detection, Event Detection, Syntax Analysis, Detect PII, Contains PII, and Prompt Safety Classification.

Note: Custom Comprehend (custom entities and custom classification) does not offer a free tier. This includes model training, inference, and model management.

5 JOBS UP TO 1MB EACH (Topic Modeling)

The Amazon Comprehend free tier is available to both new and existing AWS customers for 12 months, starting from the date of their first Amazon Comprehend request.

Amazon Comprehend pricing examples

Example 1 - Analyzing Customer Comments

Let us assume you have built an application using Amazon Comprehend to analyze customer comments on your online store. You have received 10,000 customer comments that are 550 characters each, and you are in the second year of your use of the service.

Total charge calculation:

Size of each request = 550 characters

Number of units per request = 6

Total Units: 10,000 (requests) x 6 (units per request) = 60,000

Price per unit = $0.0001

Total cost = [No. of units] x [Cost per unit] = 60,000 x $0.0001 = $6.00


Example 2 - Categorizing Documents by Topics

Let us say you have a set of research documents totaling 240 MB in size that you want to categorize by topic and recommend documents to your customers based on their area of interest. Let us also assume that you are in the second year of your use of the service and are not eligible for the free tier offering.

Total charge calculation:

Total megabytes processed = 240

Megabytes billed at a flat rate of $1 = 100

Megabytes billed at $0.004/MB = 140 [240-100]

Total cost of the job = $1.00 + [140 x $0.004] = $1.00 + $0.56 = $1.56


Example 3 - Classifying Customer Feedback using the Custom Classification API

Let us say you want to train a classifier to automatically organize new customer feedback that comes in from your website. 10 customers enter feedback every minute, and each piece of feedback is 300 characters. It takes one hour to train the custom model, and you are planning to keep this model for a month. So, model training costs will be $3 and model storage costs will be $0.5 for the month. Let us also assume that you are in the second year of your use of the service and are not eligible for the free tier offering. 

To classify the feedback asynchronously you pay by number of characters in your documents. To classify in real time you provision an endpoint with enough throughput to handle your use case and pay for the time that the end point is up.

Inference cost calculation for asynchronous classification:

Size of each request per day = 4,320,000 characters [300 characters * 10 docs * 1,440 minutes]

Number of units per request = 43,200 units [432,000 characters ÷ 100 character per unit]

Price per unit = $0.0005

Total inference cost for units = $21.60 [43,200 units x $0.0005]

Total cost = $25.10 [$21.60 inference + $3 model training + $0.50 model storage]

Total charge calculation for synchronous classification:

First, let’s calculate the required throughput. Every minute we’re classifying 10 documents of 300 character each. So that’s:

50 characters per second [300 characters x 10 documents ÷ 60 seconds]

So, you will need to provision an endpoint with 1 Inference Unit (IU), which gives a throughput of 100 characters/second.

Price for 1 IU = $0.0005 per second

You will incur costs depending on how long you’re keeping your real time classification endpoint active, regardless of how many inference calls are made.

If you’re running your real time classification endpoint for 12 hours per day:

Total inference cost = $21.60 [$0.0005 x 3600 seconds x 12 hours]

Total cost = $25.10 [$21.60 inference + $3 model training + $0.50 model storage]

Note that you incur cost for the throughput provisioned and for the amount of time the endpoint is active. If you needed to provision more throughput, the price would be:

Price for 2 IU = $0.001 per second [$0.0005 x 2]

Price for 3 IU = $0.0015 per second [$0.0005 x 3]


Example 4 - Analyzing Customer Comments using the Custom Entities API

Let us say you want to train a custom entity model to automatically extract custom terms from customer feedback that comes in from your website. The training job takes 1.5 hours, and you analyze 10,000 pieces of customer feedback that are 550 characters each. You are planning to keep this model for a month. Let us also assume that you are in the second year of your use of the service and are not eligible for the free tier offering.

Total charge calculation:

Size of each request = 5,500,000 characters

Number of units per request = 55,000 units [5,500,000 characters ÷ 100 character per unit]

Price per unit = $0.0005

Total cost for units = $27.5 [55,000 units x $0.0005]

Total hours for model training = 1.5 hours

Price per hour = $3

Total cost for model training = $4.5 [1.5 hours x $3]

Number of months for model management = 1 month

Price per month = $0.50 

Total cost for model management = $0.50 [1 month x $0.50]

Total cost = $37 [$27.5 + $4.5 + $0.50]


Example 5 – Extracting events and the associated information using Event Detection

Let’s assume you want to extract 3 event types from 3,000 articles of 500 characters each and you are in the second year of your use of the service.

Total charge calculation:

Number of characters processed = 1,500,000 characters [3,000 articles x 500 characters]

Number of units processed = 45,000 units [1,500,000 x 3 event types ÷ 100 characters per unit]

Price per unit = $0.003

Total cost for units = $135 [45,000 units x $0.003]


Example 6 – Identifying documents with PII using Contains PII API

Let us assume you have built an application using Amazon Comprehend to analyze customer comments on your online store. You have received 10,000 customer comments that are 550 characters each, and you need to identify which documents contain PII so that they can be stored in a secure location. Let us assume you are in the second year of your use of the service.

Total charge calculation:

Size of each request = 550 characters

Number of units per request = 6

Total Units = 60,000 [10,000 requests x 6 units per request]

Price per unit = $0.000002

Total cost = $0.12 [60,000 units x $0.000002]


Example 7 – Redacting PII from documents using Detect PII API

Let us assume you have built an application using Amazon Comprehend to analyze customer comments on your online store. You have received 10,000 customer comments that are 550 characters each, and you need to create redacted versions of the documents before they are archived. Let us assume you are in the second year of your use of the service.

Total charge calculation:

Size of each request = 550 characters

Number of units per request = 6

Total Units = 60,000 [10,000 requests x 6 units per request]

Price per unit = $0.0001

Total cost = $6 [60,000 units x $0.0001]

Example 8 – Extracting Mortgage Application Entities using the Custom Entity API

Let us say you want to train a custom entity extraction model to extract 10 custom entities from a mortgage application. One hundred customers apply every day, each providing a 10-page scanned PDF document containing 2,500 characters per page. Using Amazon Textract, let’s assume we need to extract the text from every single page processed before extracting entities using Detect Document Text API. It takes one hour to train the custom model, and you are planning to keep this model for a month. So, model training costs will be $3 and model storage costs will be $0.50 for the month. Let us also assume that you are in the second year of your use of the service and are not eligible for the free tier offering. To extract custom entities asynchronously you pay by number of characters in your documents. To extract entities in real time you provision an endpoint with enough throughput to handle your use case and pay for the time that the end point is up.

Inference cost calculation for asynchronous classification:

Size of each request per day = 2,500,000 characters [100 applications/day * 10 docs * 2,500 characters]

Number of units per request = 25,000 units [2,500,000 characters ÷ 100 character per unit]

Price per unit = $0.0005

Total inference cost for units = $12.50 [25,000 units x $0.0005]

Amazon Textract cost for Detect Document Text API= $1.50 [100 applications/day * 10 docs * $0.0015 price per page, up to 1M pages]

Total cost = $17.50 [$12.50 inference + $1.50 Textract + $3 model training + $0.50 model storage]

 

Example 9 – Analyzing Employee Survey Responses

Let us assume you have built an application using Amazon Comprehend Targeted Sentiment to analyze employee survey responses for your corporation. You have received 100,000 survey responses that are 350 characters each, and you are in the second year of your use of the service.

Total charge calculation:

Size of each request = 350 characters

Number of units per request = 4

Total Units: 100,000 (requests) x 4 (units per request) = 400,000

Price per unit = $0.0001 (from 0-10M units)

Total cost = [No. of units] x [Cost per unit] = 400,000 x $0.0001 = $40.00

 

Example 10 - Detecting toxicity in online comments on website

Let us assume you have built an application using Amazon Comprehend to detect toxicity in comments on your website. You have received 100M customer comments that are 100 characters each, and you need to identify which comments are toxic in nature and should be redacted. Let us assume you are in the second year of your use of the service.

            Total charge calculation:

            Size of each request = 100 characters

            Number of units per request = 1 Total

            Units= 100M IUs [100M comments x 1 unit per request]

            Price per unit = $0.0001 [from 0 - 10M IUs] + $0.00005 [from 10M - 50M IUs] + $0.000025 [from 50M – 100M IUs]

            Total cost = [No. of units] x [Cost per unit]

            = [10MX$0.001]+[40MX$0.00005]+[50MX$0.000025]

            = $1,000 + $2,000 + $1,250

            = $4,250

Example 11 - Detecting unsafe prompts in generative AI application

Let us assume you have built an application using Amazon Comprehend to detect unsafe input prompts when your users interact with your generative AI product. You have received 10M input prompts that are 500 characters each, and you need to identify which prompts are unsafe. Let us assume you are in the second year of your use of the service.

            Total charge calculation:

            Size of each request = 500 characters

            Number of units per request = 5

            Total Units= 50M IUs [10M comments x 5 unit per request]

            Price per unit = $0.0001 [from 0 - 10M IUs] + $0.00005 [from 10M - 50M IUs] + $0.000025 [from 50M – 100M IUs]

            Total cost = [No. of units] x [Cost per unit]

            = [10M X $0.001]+[40M X $0.00005]

            = $1,000 + $2,000

            = $3,000

Learn more about Amazon Comprehend features

Visit the features page
Ready to get started?
Sign up
Have more questions?
Contact us