Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to discover insights from text. Amazon Comprehend provides Keyphrase Extraction, Sentiment Analysis, Entity Recognition, Topic Modeling, and Language Detection APIs so you can easily integrate natural language processing into your applications. You simply call the Amazon Comprehend APIs in your application and provide the location of the source document or text. The APIs will output entities, key phrases, sentiment, and language in a JSON format, which you can use in your application.
Keyphrase Extraction
The Keyphrase Extraction API returns the key phrases or talking points and a confidence score to support that this is a key phrase.
-
Example: In this example, a customer is comparing a DSLR camera to an instant film camera. The API extracts key phrases, counts the number of times a key phrase is repeated, and returns a confidence score about the results.
Sample text: I'm an avid photographer, and I'm primarily found shooting with my DSLR or my instant film camera that I carry around for casual use. While nothing beats my DSLR in power and convenience, there's something magical about my instant film camera. Perhaps it's that you're shooting on actual film, or maybe it's that every shot you take is a unique physical artifact (which is special in today's world of Instagram and Facebook, where photos are a dime a dozen). All I know for sure is that they are incredibly fun to use and peoples' eyes light up when you pull one of these out at a party.
Keyphrase Count Confidence an avid photographer 1 0.99
my DSLR 2 0.97 my instant film camera 2 0.99
casual use 1 0.99
power and convenience 1 0.94 actual film 1 0.99 every shot 1 0.92 a unique physical artifact 1 0.99
today 1 0.91 world 1 0.99
Instagram and Facebook 1 0.99
Sentiment Analysis
The Sentiment Analysis API returns the overall sentiment of a text (Positive, Negative, Neutral, or Mixed).
-
Example: In this example, a customer is posting his feedback on a pair of shoes. The API identifies the sentiment expressed by the customer along with a confidence score.
Sample Text: I ordered a small and expected it to fit just right but it was a little bit more like a medium-large. It was great quality. It's a lighter brown than pictured but fairly close. Would be ten times better if it was lined with cotton or wool on the inside.
Sentiment Score Mixed 0.89 Positive 0.09 Negative 0.01 Neutral 0.00
Syntax Analysis
The Amazon Comprehend Syntax API enables customers to analyze text using tokenization and Parts of Speech (PoS), and identify word boundaries and labels like nouns and adjectives within the text.
-
Example: In this example we will be analyzing a short document using the Comprehend Syntax API. The Syntax API tokenizes (defines word boundaries) text and labels each word with its associated part of speech e.g. noun and verb. In addition to noting begin and ending offset (so you know where the word is within the text), we also provide a confidence score.
Sample Text: I love my fast, new Kindle Fire!
Text Tag I Pronoun Love Verb
My Pronoun Fast Adjective , Punctuation New Adjective Kindle Proper noun Fire
Proper noun ! Punctuation
Entity Recognition
The Entity Recognition API returns the named entities ("People," "Places," "Locations," etc.) that are automatically categorized based on the provided text.
-
Example: In this example, we are looking at the description of a company. The API identifies entities like Organization, Date, Location, counts the number times an entity is mentioned, and returns a confidence score.
Sample Text: Amazon.com, Inc. is located in Seattle, WA and was founded July 5th, 1994 by Jeff Bezos, allowing customers to buy everything from books to blenders. Seattle is north of Portland and south of Vancouver, BC. Other notable Seattle-based companies are Starbucks and Boeing.
Entity Category Count Confidence Amazon.com, Inc.
Organization 1 0.96 Seattle, WA Location 1 0.96 July 5th, 1994 Date 1 0.99 Jeff Bezos Person 1 0.99 Seattle
Location 2
0.98 Portland
Location 1 0.99 Vancouver, BC Location 1 0.97 Starbucks
Organization 1 0.91
Boeing
Organization 1 0.99
Comprehend Medical
Medical Named Entity and Relationship Extraction (NERe)
The Medical NERe API returns the medical information such as medication, medical condition, test, treatment and procedures (TTP), anatomy, and Protected Health Information (PHI). It also identifies relationships between extracted sub-types associated to Medications and TTP. There is also contextual information provided as entity “traits” (negation, or if a diagnosis is a sign or symptom). The table below shows the extracted information with relevant sub-types and entity traits.
To only extract PHI, you can use the Protected Health Information Data Identification (PHId) API.
-
Example: In this example, we are looking at the admission note. The API identifies medical information, and returns a confidence score.
Sample Text: Mr. Smith is a 63-year-old gentleman with coronary artery disease and hypertension. CURRENT MEDICATIONS: taking a dose of LIPITOR 20 mg once daily.
Medical Ontology Linking
The Medical Ontology Linking APIs identifies medical information and links them to codes and concepts in standard medical ontologies. Medical conditions are linked to ICD-10-CM codes (e.g. “headache” is linked to the “R51” code) with the InferICD10CM API, while medications are linked to RxNorm codes (“Acetaminophine / Codeine” is linked to the “C2341132” cui). The Medical Ontology Linking APIs also detects contextual information as entity traits (e.g. negation).
Custom Entities
Custom Entities allows you to customize Amazon Comprehend to identify terms that are specific to your domain. Using AutoML, Comprehend will learn from a small private index of examples (for example, a list of policy numbers and text in which they are used), and then train a private, custom model to recognize these terms in any other block of text. There are no servers to manage, and no algorithms to master.
-
Example: In this example, an insurance company would like to analyze text documents for entities specific to their business, policy numbers.
Sample Text: Hi, my name is Sam Ford and I am filing a claim for car accident. My policy code is 456-YQT.
Entity Category Count Confidence 456-YQT Policy_ID 1 0.95
Language Detection
The Language Detection API automatically identifies text written in over 100 languages and returns the dominant language with a confidence score to support that a language is dominant.
-
Example: In this example, the API parses the text and is able to identify the dominant language in the text as Italian along with a confidence score.
Sample Text: Amazon Elastic Compute Cloud (Amazon EC2) è un servizio Web che fornisce capacità di elaborazione sicura e scalabile nel cloud. È concepito per rendere più semplice il cloud computing su scala Web per gli sviluppatori.
ISO-639-1 Language Code Language Confidence it Italian 1.0
Custom Classification
The Custom Classification API enables you to easily build custom text classification models using your business-specific labels without learning ML. For example, your customer support organization can use Custom Classification to automatically categorize inbound requests by problem type based on how the customer has described the issue. Creating a custom model is simple. You provide examples of text for each of the labels you want to use, and Comprehend trains on those to create your custom model. No machine learning experience required, you can build your custom model without using a single line of code. An SDK is available for you to integrate your customer classifier into your current applications. With your custom model, it is easy to moderate website comments, triage customer feedback, and organize workgroup documents. Refer to this documentation page for more details.
-
Example: Let’s say you want to organize your customer support feedback at an airline company. You want to organize each piece of feedback into Account Questions, Ticket Refunds and Flight Complaints. To train the service, you create a CSV file that contains example text from each issue, and label each sample with one of the four labels that applies. The service will automatically train a custom model on your behalf. To use your model to analyze all of the calls the next day, you submit each text file to the service and receive the labeled results along with a confidence of the label match.
Text Label Confidence Score Line 0 Account Question 0.92 Line 1 Ticket Refund 1 Line 2 Flight Complaint 1 Line 3 Flight Complaint 0.91 Doc5.csv Ticket Refund 1
Topic Modeling
Topic Modeling identifies relevant terms or topics from a collection of documents stored in Amazon S3. It will identify the most common topics in the collection and organize them in groups and then map which documents belong to which topic.
-
Example: If your documents (Doc1.txt, Doc2.txt, Doc3.txt, and Doc4.txt) are stored in Amazon S3, and you point Amazon Comprehend to their location, Comprehend will analyze the documents and return two views:
1. Grouping of keywords that are topics.
Each group of keywords is associated with a topic group. Weight refers to the prevalence of that keyword within the group. Keywords with the weight closest to 1 are most indicative of the topic group’s context.Topic Group Keywords Weight 1 Amazon 0.87 1 Seattle 0.65 2 Holidays 0.78 2 Shopping 0.67 Each group of keywords is associated with a topic group. Weight refers to the prevalence of that keyword within the group. Keywords with the weight closest to 1 are most indicative of the topic group’s context.2. Grouping of documents by topics.
Document Name Topic Group Proportion Doc1.txt 1 0.87 Doc2.txt 1 0.65 Doc3.txt 2 0.78 Doc4.txt 2 0.67 Each document is mapped to a topic group based on the proportion of the topic group’s weighted keywords that are present in the document.
Multiple language support
Amazon Comprehend can perform text analysis on English, French, German, Italian, Portuguese, and Spanish texts. This lets you build applications that can detect text in multiple languages, convert the text to English, French, German, Italian, Portuguese, and Spanish with Amazon Translate, and then use Amazon Comprehend to perform text analysis.
Learn more about Amazon Comprehend pricing