Amazon Comprehend provides natural language processing, Personal Identifiable Information (PII) detection and redaction, Custom Classification and Entity detection, and topic modeling, enabling a broad range of applications that can analyze raw text, and with some APIs, document formats like PDF and Word.  

  • Natural Language Processing: Amazon Comprehend APIs for entity recognition, sentiment analysis, syntax analysis, key phrase extraction, and language detection can be used to extract insights from natural language text. These requests are measured in units of 100 characters (1 unit = 100 characters), with a 3 unit (300 character) minimum charge per request.
  • Personal Identifiable Information (PII): The detect PII API finds locations of chosen Personally Identifiable Information (“PII”) entities inside a document and can be used to create redacted versions of documents. The contains PII API tells you if a document contains the chosen PII or not. These requests are also measured in units of 100 characters (1 unit = 100 characters), with a 3 unit (300 character) minimum charge per request.
  • Custom Comprehend: The Custom Classification and Entities APIs can train a custom NLP model to categorize text and extract custom entities. Asynchronous inference requests are measured in units of 100 characters, with a 3 unit (300 character) minimum charge per request. You are charged $3 per hour for model training (billed by the second) and $0.50 per month for custom model management. For synchronous Custom Classification and Entities inference requests, you provision an endpoint with the appropriate throughput. You are charged from the time that you start your endpoint until it is deleted.
  • Topic Modeling: Topic Modeling identifies relevant terms or topics from a collection of documents stored in Amazon S3. It will identify the most common topics in the collection and organize them in groups and then map which documents belong to which topic. You are charged based on the total size of documents processed per job. The first 100 MB is charged a flat rate. Above 100 MB, you are charged per MB.
  • You can estimate your costs using the AWS Pricing Calculator.
For volumes higher than 100M units per month, please contact us for pricing.
NLP requests are measured in units of 100 characters, with a 3 unit (300 character) minimum charge per request.

With Amazon Comprehend APIs, you can process both unstructured, raw text and, with some APIs, other text files like PDF and Word documents. 

Custom Comprehend

Custom Entities & Classification
For asynchronous entity recognition on PDF*, Word, and plain text documents

Inference requests are measured in units of 100 characters, with a 3 unit (300 character) minimum charge per request.

For asynchronous classification

Inference requests are measured in units of 100 characters, with a 3 unit (300 character) minimum charge per request.

For synchronous classification and entity recognition

Endpoints are billed on one second increments, with a minimum of 60 seconds. Charges will continue to incur from the time you start the endpoint until it is deleted even if no documents are analyzed.

One inference unit (IU) provides a throughput of 100 characters/second on your managed endpoint. You can provision additional IUs for more throughput. Each IU will incur $0.0005 per second.

$3 per Hour for Model Training

*to extract text from scanned PDF documents Amazon Textract Detect Document Text API is called.

Topic Modeling

For the first 100MB

For every MB above 100MB

You are charged based on the total size of documents processed per topic modeling job. The first 100 MB is charged a flat rate. Above 100 MB, you are charged per MB.

Free Tier

50K units of text (5M characters)

For each of the 9 APIs (Key Phrase Extraction, Sentiment Analysis, Entity Recognition, Language Detection, Detect PII, Contains PII, Event Detection, Syntax Analysis, Custom Entities, and Custom Classification) per month, starting from the date of your first Amazon Comprehend request.

For the Custom Classification and Custom Entities, there is no free tier for model training, model management, and endpoints.

5 jobs up to 1MB each

For topic modeling

The Amazon Comprehend free tier is available to both new and existing AWS customers for 12 months, starting from the date of their first Amazon Comprehend request.

Amazon Comprehend Medical Pricing

With Amazon Comprehend Medical, you pay only for what you use. You are charged based on the amount of text processed on a monthly basis. Amazon Comprehend Medical provides two APIs: Medical Named Entity and Relationship Extraction (NERe) and Protected Health Information Data Extraction and Identification (PHId).

The Medical NERe API extracts entities, entity relationships, entity traits, and PHI. If customers want to only identify PHI for data protection, they can request the PHId API. All API requests are measured in units of 100 characters, with one unit (100 characters) minimum charge per request.

Amazon Comprehend Medical Free Tier

Amazon Comprehend Medical offers a free tier covering 25k units of text (2.5M characters) for the first three months when you start using the service for any of the APIs.

Amazon Comprehend pricing examples

Example 1 - Analyzing Customer Comments

Let us assume you have built an application using Amazon Comprehend to analyze customer comments on your online store. You have received 10,000 customer comments that are 550 characters each, and you are in the second year of your use of the service.

Total charge calculation:

Size of each request = 550 characters

Number of units per request = 6

Total Units: 10,000 (requests) x 6 (units per request) = 60,000

Price per unit = $0.0001

Total cost = [No. of units] x [Cost per unit] = 60,000 x $0.0001 = $6.00


Example 2 - Categorizing Documents by Topics

Let us say you have a set of research documents totaling 240 MB in size that you want to categorize by topic and recommend documents to your customers based on their area of interest. Let us also assume that you are in the second year of your use of the service and are not eligible for the free tier offering.

Total charge calculation:

Total megabytes processed = 240

Megabytes billed at a flat rate of $1 = 100

Megabytes billed at $0.004/MB = 140 [240-100]

Total cost of the job = $1.00 + [140 x $0.004] = $1.00 + $0.56 = $1.56


Example 3 - Classifying Customer Feedback using the Custom Classification API

Let us say you want to train a classifier to automatically organize new customer feedback that comes in from your website. 10 customers enter feedback every minute, and each piece of feedback is 300 characters. It takes one hour to train the custom model, and you are planning to keep this model for a month. So, model training costs will be $3 and model storage costs will be $0.5 for the month. Let us also assume that you are in the second year of your use of the service and are not eligible for the free tier offering.

To classify the feedback asynchronously you pay by number of characters in your documents. To classify in real time you provision an endpoint with enough throughput to handle your use case and pay for the time that the end point is up. 

Inference cost calculation for asynchronous classification:

Size of each request per day = 4,320,000 characters [300 characters * 10 docs * 1,440 minutes]

Number of units per request = 43,200 units [432,000 characters ÷ 100 character per unit]

Price per unit = $0.0005

Total inference cost for units = $21.60 [43,200 units x $0.0005]

Total cost = $25.10 [$21.60 inference + $3 model training + $0.50 model storage]

Total charge calculation for synchronous classification:

First, let’s calculate the required throughput. Every minute we’re classifying 10 documents of 300 character each. So that’s:

50 characters per second [300 characters x 10 documents ÷ 60 seconds]

So, you will need to provision an endpoint with 1 Inference Unit (IU), which gives a throughput of 100 characters/second.

Price for 1 IU = $0.0005 per second

You will incur costs depending on how long you’re keeping your real time classification endpoint active, regardless of how many inference calls are made.

If you’re running your real time classification endpoint for 12 hours per day:

Total inference cost = $21.60 [$0.0005 x 3600 seconds x 12 hours]

Total cost = $25.10 [$21.60 inference + $3 model training + $0.50 model storage]

Note that you incur cost for the throughput provisioned and for the amount of time the endpoint is active. If you needed to provision more throughput, the price would be:

Price for 2 IU = $0.001 per second [$0.0005 x 2]

Price for 3 IU = $0.0015 per second [$0.0005 x 3]


Example 4 - Extracting Medical Entities from Clinical Documents

Let us assume you have built an application using Amazon Comprehend Medical to analyze clinical documents in your data lake. You have 1,000 clinical documents that are 2,550 characters each. Let us also assume that you are in the second year of your use of the service and are not eligible for the free tier offering.

Total charge calculation:

Size of each request = 2,550 characters

Number of units per request = 26 units [2,550 characters ÷ 100 character per unit]

Total Units: 1,000 (requests) x 26 (units per request) = 26,000

Price per unit = $0.01

Total cost = [No. of units] x [Cost per unit] = 26,000 x $0.01 = $260.00


Example 5 - Analyzing Customer Comments using the Custom Entities API

Let us say you want to train a custom entity model to automatically extract custom terms from customer feedback that comes in from your website. The training job takes 1.5 hours, and you analyze 10,000 pieces of customer feedback that are 550 characters each. You are planning to keep this model for a month. Let us also assume that you are in the second year of your use of the service and are not eligible for the free tier offering.

Total charge calculation:

Size of each request = 5,500,000 characters

Number of units per request = 55,000 units [5,500,000 characters ÷ 100 character per unit]

Price per unit = $0.0005

Total cost for units = $27.5 [55,000 units x $0.0005]

Total hours for model training = 1.5 hours

Price per hour = $3

Total cost for model training = $4.5 [1.5 hours x $3]

Number of months for model management = 1 month

Price per month = $0.50 

Total cost for model management = $0.50 [1 month x $0.50]

Total cost = $37 [$27.5 + $4.5 + $0.50]


Example 6 – Extracting events and the associated information using Event Detection

Let’s assume you want to extract 3 event types from 3,000 articles of 500 characters each and you are in the second year of your use of the service.

Total charge calculation:

Number of characters processed = 1,500,000 characters [3,000 articles x 500 characters]

Number of units processed = 45,000 units [1,500,000 x 3 event types ÷ 100 characters per unit]

Price per unit = $0.003

Total cost for units = $135 [45,000 units x $0.003]


Example 7 – Identifying documents with PII using Contains PII API

Let us assume you have built an application using Amazon Comprehend to analyze customer comments on your online store. You have received 10,000 customer comments that are 550 characters each, and you need to identify which documents contain PII so that they can be stored in a secure location. Let us assume you are in the second year of your use of the service.

Total charge calculation:

Size of each request = 550 characters

Number of units per request = 6

Total Units = 60,000 [10,000 requests x 6 units per request]

Price per unit = $0.000002

Total cost = $0.12 [60,000 units x $0.000002]

Example 8 – Redacting PII from documents using Detect PII API

Let us assume you have built an application using Amazon Comprehend to analyze customer comments on your online store. You have received 10,000 customer comments that are 550 characters each, and you need to create redacted versions of the documents before they are archived. Let us assume you are in the second year of your use of the service.

Total charge calculation:

Size of each request = 550 characters

Number of units per request = 6

Total Units = 60,000 [10,000 requests x 6 units per request]

Price per unit = $0.0001

Total cost = $6 [60,000 units x $0.0001]

Example 9 – Extracting Mortgage Application Entities using the Custom Entity API

Let us say you want to train a custom entity extraction model to extract 10 custom entities from a mortgage application. One hundred customers apply every day, each providing a 10-page scanned PDF document containing 2,500 characters per page. Using Amazon Textract, let’s assume we need to extract the text from every single page processed before extracting entities using Detect Document Text API. It takes one hour to train the custom model, and you are planning to keep this model for a month. So, model training costs will be $3 and model storage costs will be $0.50 for the month. Let us also assume that you are in the second year of your use of the service and are not eligible for the free tier offering. To extract custom entities asynchronously you pay by number of characters in your documents. To extract entities in real time you provision an endpoint with enough throughput to handle your use case and pay for the time that the end point is up.

Inference cost calculation for asynchronous classification:

Size of each request per day = 2,500,000 characters [100 applications/day * 10 docs * 2,500 characters]

Number of units per request = 25,000 units [2,500,000 characters ÷ 100 character per unit]

Price per unit = $0.0005

Total inference cost for units = $12.50 [25,000 units x $0.0005]

Amazon Textract cost for Detect Document Text API= $1.50 [100 applications/day * 10 docs * $0.0015 price per page, up to 1M pages]

Total cost = $17.50 [$12.50 inference + $1.50 Textract + $3 model training + $0.50 model storage]

 

Learn more about Amazon Comprehend features

Visit the features page
Ready to get started?
Sign up
Have more questions?
Contact us