Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to discover insights from text. Amazon Comprehend provides Custom Entity Recognition, Custom Classification, Key phrase Extraction, Sentiment Analysis, Entity Recognition, and more APIs so you can easily integrate natural language processing into your applications. You simply call the Amazon Comprehend APIs in your application and provide the location of the source document or text. The APIs will output entities, key phrases, sentiment, and language in a JSON format, which you can use in your application.
Custom Entity Recognition
Custom Entity Recognition allows you to customize Amazon Comprehend to identify terms that are specific to your domain. Using AutoML, Comprehend will learn from a small set of examples (for example, a list of policy numbers, claim numbers, or SSN), and then train a private, custom model to recognize these terms such as claim numbers in any other block of text within PDFs, plain text, or Microsoft Word documents – no machine learning required. Refer to this documentation page for more details.
The Custom Classification API enables you to easily build custom text classification models using your business-specific labels without learning ML. For example, your customer support organization can use Custom Classification to automatically categorize inbound requests by problem type based on how the customer has described the issue. With your custom model, it is easy to moderate website comments, triage customer feedback, and organize workgroup documents. Refer to this documentation page for more details.
The Entity Recognition API returns the named entities ("People," "Places," "Locations," etc.) that are automatically categorized based on the provided text. Refer to this documentation page for more details.
The Sentiment Analysis API returns the overall sentiment of a text (Positive, Negative, Neutral, or Mixed). Refer to this documentation page for more details.
Targeted Sentiment provides more granular sentiment insights by identifying the sentiment (positive, negative, neutral, or mixed) towards entities within text. Refer to this documentation page for more details.
PII Identification and Redaction
Use Amazon Comprehend ML capabilities to detect and redact personally identifiable information (PII) in customer emails, support tickets, product reviews, social media, and more. No ML experience required. For example, you can analyze support tickets and knowledge articles to detect PII entities and redact the text before you index the documents in the search solution. After that, search solutions are free of PII entities in documents. Redacting PII entities helps you protect privacy and comply with local laws and regulations. Refer to this documentation page for more details.
Comprehend toxicity detection provides a simple, NLP-based solution for toxic content detection in text-based documents. The capability is available out-of-the-box to moderate peer-to-peer conversation in online platforms and generative AI inputs and outputs. Refer to this documentation page for more details.
Prompt Safety Classification
Comprehend provides a pre-trained binary classifier that can classify the input prompt as harmful or not. This can be integrated to allow LLMs to only respond to harmless content. Refer to this documentation page for more details
The Keyphrase Extraction API returns the key phrases or talking points and a confidence score to support that this is a key phrase. Refer to this documentation page for more details.
Comprehend Events lets you extract the event structure from a document, distilling pages of text down to easily processed data for consumption by your AI applications or graph visualization tools. This API allows you to answer who-what-when-where questions over large document sets, at scale and without prior NLP experience. Use Comprehend Events to extract granular details about real-world events and associated entities expressed in unstructured text. Refer to this documentation page for more details.
The Language Detection API automatically identifies text written in over 100 languages and returns the dominant language with a confidence score to support that a language is dominant. Refer to this documentation page for more details.
The Amazon Comprehend Syntax API enables customers to analyze text using tokenization and Parts of Speech (PoS), and identify word boundaries and labels like nouns and adjectives within the text. Refer to this documentation page for more details.
Topic Modeling identifies relevant terms or topics from a collection of documents stored in Amazon S3. It will identify the most common topics in the collection and organize them in groups and then map which documents belong to which topic. Refer to this documentation page for more details.
Multiple language support
Amazon Comprehend can perform text analysis on German, English, Spanish, Italian,
Portuguese, French, Japanese, Korean, Hindi, Arabic, Chinese (simplified), Chinese (traditional) text. To build applications in other languages, customers can use Amazon Translate to convert the text into a language supported by Comprehend and then use Comprehend to perform text analysis. For more details on language support, see the documentation page.