Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to discover insights from text. Amazon Comprehend provides Keyphrase Extraction, Sentiment Analysis, Entity Recognition, Topic Modeling, and Language Detection APIs so you can easily integrate natural language processing into your applications. You simply call the Amazon Comprehend APIs in your application and provide the location of the source document or text. The APIs will output entities, key phrases, sentiment, and language in a JSON format, which you can use in your application.
The Keyphrase Extraction API returns the key phrases or talking points and a confidence score to support that this is a key phrase.
The Sentiment Analysis API returns the overall sentiment of a text (Positive, Negative, Neutral, or Mixed).
The Amazon Comprehend Syntax API enables customers to analyze text using tokenization and Parts of Speech (PoS), and identify word boundaries and labels like nouns and adjectives within the text.
The Entity Recognition API returns the named entities ("People," "Places," "Locations," etc.) that are automatically categorized based on the provided text.
Medical Named Entity and Relationship Extraction (NERe)
The Medical NERe API returns the medical information such as medication, medical condition, test, treatment and procedures (TTP), anatomy, and Protected Health Information (PHI). It also identifies relationships between extracted sub-types associated to Medications and TTP. There is also contextual information provided as entity “traits” (negation, or if a diagnosis is a sign or symptom). The table below shows the extracted information with relevant sub-types and entity traits.
To only extract PHI, you can use the Protected Helath Information Data Identification (PHId) API.
Custom Entities allows you to customize Amazon Comprehend to identify terms that are specific to your domain. Using AutoML, Comprehend will learn from a small private index of examples (for example, a list of policy numbers and text in which they are used), and then train a private, custom model to recognize these terms in any other block of text. There are no servers to manage, and no algorithms to master.
The Language Detection API automatically identifies text written in over 100 languages and returns the dominant language with a confidence score to support that a language is dominant.
The Custom Classification API enables you to easily build custom text classification models using your business-specific labels without learning ML. For example, your customer support organization can use Custom Classification to automatically categorize inbound requests by problem type based on how the customer has described the issue. Creating a custom model is simple. You provide examples of text for each of the labels you want to use, and Comprehend trains on those to create your custom model. No machine learning experience required, you can build your custom model without using a single line of code. An SDK is available for you to integrate your customer classifier into your current applications. With your custom model, it is easy to moderate website comments, triage customer feedback, and organize workgroup documents. Refer to this documentation page for more details.
Topic Modeling identifies relevant terms or topics from a collection of documents stored in Amazon S3. It will identify the most common topics in the collection and organize them in groups and then map which documents belong to which topic.
Multiple language support
Amazon Comprehend can perform text analysis on English, French, German, Italian, Portuguese, and Spanish texts. This lets you build applications that can detect text in multiple languages, convert the text to English, French, German, Italian, Portuguese, and Spanish with Amazon Translate, and then use Amazon Comprehend to perform text analysis.