AWS Machine Learning Blog

Detect sentiment from customer reviews using Amazon Comprehend

In today’s world, public content has never been more relevant. Data from customer reviews is being used as a tool to gain insight into consumption-related decisions as the understanding of its associated sentiment grants businesses invaluable market awareness and the ability to proactively address issues early.

Sentiment analysis uses a process to computationally determine whether a piece of writing is positive, negative, neutral, or mixed. Amazon Comprehend is a natural language processing (NLP) text analytics service made up of a handful of APIs that allows you to detect sentiment (along with key phrases, named entities, and language) and perform topic modeling from a collection of documents. The service’s ability to detect sentiment is done using state-of-the-art deep learning algorithms that use scoring mechanisms and attributes during the evaluation of text. The Amazon Comprehend training data set primarily consists of data found in product descriptions and consumer reviews from one of the largest natural language collections in the world — Amazon.com. We give you a fully trained model that continuously retrains against new data to keep pace with the evolution of language. ML in general requires a different skillset than most data engineers and developers currently have. Amazon Comprehend has removed this gap and made NLP easy to consume using the skills developers already have.

In this blog post, we will show you how to leverage Amazon Comprehend as part of a serverless event driven architecture, built with AWS services, to detect customer sentiment.

Solution Architecture Overview

Let’s take a look at product reviews on Amazon.com and use Amazon Comprehend to classify the sentiment for a given review. We will use the Amazon Echo, Amazon Echo Dot, and the Amazon Echo Show reviews as examples. We will then upload additional fake sample data, in an attempt to prevent tarnishing a brand, and simulate retrieving negative product sentiment with nuanced information such as defective, damaged, or hazardous items that are on recall. Finally, we will place the business in a position to take immediate action by using Amazon Athena to interactively query for the negative reviews and export the report.

Review Upload: User will upload customer review in text format to the Customer Review bucket. 

Customer Review Sentiment Analysis Function: The secure review upload is used as an Amazon S3 event to trigger the Review Sentiment Analysis function that downloads the review to a temporary file, calls Amazon Comprehend to run text analytics against it, and then outputs the overall sentiment along with the positive, negative, neutral, and mixed confidence scores to a CSV file. The CSV file with the sentiment is stored in a sentiment folder of the same Customer Review bucket.

Interactive SQL Query:  Amazon Athena is used to query the review results and focus in on the negative sentiment.

Step by Step Configuration

We will start off by deploying an AWS CloudFormation template to provision the necessary AWS Identity and Access Management (IAM) role and Lambda function needed in order to interact with the Amazon S3, AWS Lambda, and Amazon Comprehend APIs.

Region Region Code Launch
1 US East
(N. Virginia)
us-east-1  
  1. In the CloudFormation console, choose the Launch Stack button (above). If interested, you can view the YAML template here.
  2. Choose Next on the Select Template page.
  3. Choose Next on the Specify Details page.
  4. On the Options page, leave all the defaults and Choose Next.
  5. On the Review page, check the boxes to acknowledge that CloudFormation will create IAM resources and IAM resources with customer names.
  6. Choose Create Change Set.

Note: The CloudFormation template we’ve provided is written using AWS Serverless Application Model (AWS SAM). AWS SAM simplifies how to define functions, APIs, etc. for serverless applications, as well as some features for these services like environment variables. When deploying SAM templates in CloudFormation template, a transform step is required to convert the SAM template into standard CloudFormation, thus you must choose the Create Change Set button to make the transform happen.

  1. Wait a few seconds for the change set to finish computing changes. Your screen should look as follows:
  2. Finally, choose Execute and then let the CloudFormation launch resources in the background. You don’t need to wait for it to finish before proceeding to the next step.

Amazon Simple Storage Service (S3) bucket event trigger:

Now that you have your IAM role, Lambda function, and S3 bucket deployed, let’s make sure that we create an S3 event trigger for your Comprehend Sentiment Analysis function.

  1. Open the Amazon S3 console and select new S3 bucket that begins with ‘review-sentiment.’
  2. Choose the Properties Under the Advanced Settings section, choose the Events box.
  3. Choose + Add notification and configure the following:
    1. Name: SentimentAnalysis
    2. Events: All objects create events
    3. Suffix: .txt
    4. Send to: Lambda Function
    5. Lambda: review-sentiment-ComprehendSentimentAnalysis-XYZ
  4. Choose Save.

 S3 customer review upload:

For our illustrative use case, we have pulled the top customer review for the Amazon Echo, Amazon Echo Dot, and the Amazon Echo Show. We then placed each review into a text file.

  1. Download the following three reviews:
    1. review-B01DFKC2SO.txt
    2. review-B01J24C0TI.txt
    3. review-B0749WVS7J.txt

Note: Amazon customer reviews are not licensed for commercial use. You should replace this data with your own authorized data source when implementing your application.

  1. Choose your S3 bucket from the console and then choose Add each one of the review text files and choose Upload.
  1. Refresh the bucket and then verify the following output in your bucket:

    Note: This is an event-driven serverless architecture we created. The uploaded review to our S3 bucket was considered an event that triggered our Comprehend-SentimentAnalysis function, which then in return outputs the sentiment and sentiment confidence scores into a CSV within the sentiment folder of your S3 bucket.
  2. Select a review and then choose Download:

    “My brother Robert who has been bed ridden and paralyzed with Multiple Sclerosis from his neck down for more than 30 years now has a new friend named Alexa! He was in tears with happiness when Alexa played 70’s music, played Jeopardy, answered all his questions and wakes him up every morning. Thank you Amazon for giving my brother a new bedside companion.”
  3. Choose the sentiment folder and open the CSV file to view its contents.

    The Sentiment information describes the overall sentiment of the text and also sentiment scores of each label: Positive, Negative, Neutral, and Mixed. All of these sentiment scores are returned from an MXNet deep learning model and are depicted as a float between 0 and 1, where 1 is full confidence of the sentiment label. For example, this CSV shows that the Amazon Echo Dot review has a POSITIVE overall sentiment with an 82% positive sentiment score (confidence).
  1. In order to enrich our review data, go back to your S3 bucket and upload each text file from the following sample review data here.

Amazon Comprehend is called for each review uploaded to the S3 bucket. With an unlimited amount of reviews that can be stored on S3, the next step is to be able to query through the reviews and find what matter most to the business. In the next step, we will query the reviews we currently have in S3 and then filter down on the negative reviews.

Interactive querying with Amazon Athena

We take this a step further by having our SQL statement order all of the reviews with the strongest negative sentiment in descending order. With this query, the business knows exactly where to start from and spend their cycles wisely.

In the Athena console, click on the Settings in the upper right hand corner and add the your S3 bucket to the query result location . Click Save.

Then, run the following commands to create the Athena table in the default database. Important: Replace <bucket_name> with the S3 bucket created earlier.

CREATE EXTERNAL TABLE IF NOT EXISTS default.ReviewSentimentAnalysis (
  `ImageLocation` string,
  `Timestamp` string,   
  `Sentiment` string,
  `Positive` string,
  `Negative` string,
  `Neutral` string,
  `Mixed` string
  )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ','
) LOCATION 's3://<bucket_name>/sentiment/'

After you notice that your table has been successfully created, copy the following SQL statement and paste it into the editor.  Choose Run Query.

SELECT * FROM default.ReviewSentimentAnalysis WHERE sentiment='NEGATIVE'
ORDER BY negative DESC

Conclusion

In summation, Amazon Comprehend gives you deep insight into customer feedback by allowing shifting opinions and overall sentiment to be identified quickly, which reduces the time and effort to understand customers. This also presents the opportunity for immediate adaptations to be applied in order to meet the dynamically changing needs of customers.

Training NLP is difficult and can be very expensive. There are many obstacles in the path to capturing true sentiment, such as language ambiguity through cryptic dialogue, sarcasm, and irony, as well as the symbolic expressions of emoji’s which may not be analyzed in a pure text capture. All of these obstacles can make sentiment more difficult to understand and, therefore, may impact the quality of result you receive. Even with a large data set of unstructured sentiment rich-text and the right analytics, successfully reacting to or predicting customer needs can take a considerable amount of effort. This requires businesses to possess the skills and expertise required to build efficient machine learning (ML) models that contain the optimal algorithms used to train accurate sentiment classifiers and then apply ML techniques to further reduce systematic inaccuracies while improving upon the model over time through continuous feedback loops.

Amazon Comprehend abstracts the undifferentiated heavy lifting needed from data scientist and allows you to easily integrate the service into your application or analytics solutions. In addition, you can query millions of reviews at a time on AWS and then present only the relevant information. There are many ways to gather textual information outside of our use case, such as performing real time ingestion of data via Amazon Kinesis or scheduled events in Amazon CloudWatch. Furthermore, there are many other insights you can gain into your textual data once you’ve extracted and analyzed sentiment. For example, you can load new strings of your data into your data warehouse such as Amazon Redshift, view the data within a Business Intelligence (BI) tool such as Amazon QuickSight or copy negative sentiment reviews into an S3 bucket that triggers our Simple Notification Service (SNS) to notify your customer service team.

Doing all of this in a serverless architecture allows you to write and run code without ever thinking about servers. After all, writing business logic should be the only code you write. As always, we will continue to iterate on our models in true agile fashion. Please keep the feedback coming. Now let your imagination run free and go #BuildOnAWS!


Additional Reading

Learn how to build a social media dashboard using Amazon ML services including Amazom Comprehend and Amazon Quicksight.


About the Author

As a Solutions Architect, Todd Escalona spends his time evangelizing the AWS Cloud with his Enterprise customers and within the startup community, while listening to understand their goals and working backwards from there. He defines requirements, provides architectural guidance around specific use cases, and assists in designing applications and services that are scalable, reliable, and performant. His interests spread across various technologies such as Artificial Intelligence, Machine Learning and serverless event driven architectures.

 

 

Updated August, 2020 by Byron Tolson to reflect console changes.