AWS Machine Learning Blog

Amazon Comprehend now supports Syntax Analysis

We’re excited to announce that Amazon Comprehend now provides a Syntax API.  This enables you to tokenize text (for example, to extract word boundaries) and the corresponding part of speech (PoS) for each word.

Today, Amazon Comprehend enables analysis use cases like such as knowing whether a customer comment is negative or positive, and identifying categorized proper nouns like “Amazon” as an “Organization”. With the new Syntax API, customers can analyze text at the most detailed level and the syntactical meaning of the words themselves, thus enabling even finer-grain analysis of text documents covering a wider range of use cases.

For example, let’s say you just launched a kitchen blender product, and you want to analyze customer comments to find out what colors were mentioned the most..

You can submit the following string against the API:

“It is raining today in Seattle.”

The response will return each word, a token ID, the word itself, the offset (location of the word within the text), the part of speech tag (such as whether the word is an adjective, noun, or verb), and a confidence score (which is the service’s confidence it is correct in the part of speech tag).

Here is an example of the response:

{
    "SyntaxTokens": [
        {
            "Text": "It", 
            "EndOffset": 2, 
            "BeginOffset": 0, 
            "PartOfSpeech": {
                "Tag": "PRON", 
                "Score": 0.8389829397201538
            }, 
            "TokenId": 1
        }, 
        {
            "Text": "is", 
            "EndOffset": 5, 
            "BeginOffset": 3, 
            "PartOfSpeech": {
                "Tag": "AUX", 
                "Score": 0.9189288020133972
            }, 
            "TokenId": 2
        }, 
        {
            "Text": "raining", 
            "EndOffset": 13, 
            "BeginOffset": 6, 
            "PartOfSpeech": {
                "Tag": "VERB", 
                "Score": 0.9977611303329468
            }, 
            "TokenId": 3
        }, 
        {
            "Text": "today", 
            "EndOffset": 19, 
            "BeginOffset": 14, 
            "PartOfSpeech": {
                "Tag": "NOUN", 
                "Score": 0.9993606209754944
            }, 
            "TokenId": 4
        }, 
        {
            "Text": "in", 
            "EndOffset": 22, 
            "BeginOffset": 20, 
            "PartOfSpeech": {
                "Tag": "ADP", 
                "Score": 0.9999061822891235
            }, 
            "TokenId": 5
        }, 
        {
            "Text": "Seattle", 
            "EndOffset": 30, 
            "BeginOffset": 23, 
            "PartOfSpeech": {
                "Tag": "PROPN", 
                "Score": 0.9940338730812073
            }, 
            "TokenId": 6
        }, 
        {
            "Text": ".", 
            "EndOffset": 31, 
            "BeginOffset": 30, 
            "PartOfSpeech": {
                "Tag": "PUNCT", 
                "Score": 0.9999997615814209
            }, 
            "TokenId": 7
        }
    ]
}

The service provides synchronous requests support using the DetectSyntax API action for single documents per request or the BatchDetectSyntax API action for up to 25 documents per request.

For example, using the AWS CLI, the previous request would look like the following:

[user]$ aws comprehend detect-syntax --text "It is raining today in Seattle." --language-code en

The Syntax API and the rest of the Comprehend APIs are available in the AWS SDK here: https://aws.amazon.com/tools/

To get started or to learn more about Amazon Comprehend, visit: https://aws.amazon.com/comprehend/

 


About the Author

Binny Peh is a Sr. Product Marketing Manager for AWS machine learning solutions. In her spare time, she indulges in too much television and is an aspiring foodie. Binny’s glass is always half-full, and she believes in the power of positive thinking.