AWS Machine Learning Blog

Amazon Comprehend now supports asynchronous processing along with larger document sizes

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text.  Starting today, you have the option to analyze a collection of documents stored in an Amazon S3 bucket using our new asynchronous job service. This is in addition to single and multiple document synchronous calls to the REST API — giving you a variety of options that best fit your applications needs.

Asynchronous operations are particularly useful for analyzing large data sets when the application doesn’t need a synchronous response from the service. For example, you can run these jobs, daily, weekly, or even monthly text analytics batch runs.

The new asynchronous operations now support individual documents of up to 100KB for entity and key phrase detection, 1MB for language detection, and 5KB for sentiment detection. The total size of all files in the batch must be under 5GB and we cannot submit more than 1 million individual files per batch.

It’s easy to integrate natural language processing into your applications. Amazon Comprehend provides Keyphrase Extraction, Sentiment Analysis, Entity Recognition, Topic Modeling, and Language Detection APIs.

With this release, Amazon Comprehend now provides the following operations:

  • Synchronous Single-Document Processing—You call Amazon Comprehend with a single document and receive a synchronous response.
  • Synchronous Multi-Document (25 per request) —You call Amazon Comprehend with a collection of up to 25 documents and receive a synchronous response.
  • Asynchronous Batch Processing (5 GB job size limit, 100 KB document size) — Use a collection of documents stored in an Amazon S3 bucket and start an asynchronous operation to analyze the documents. The results of the analysis are returned in an S3 bucket.

To analyze a collection of documents, you typically perform the following steps:

  1. Store the documents in an Amazon S3 bucket.
  2. Start one or more jobs to analyze the documents.
  3. Monitor the progress of an analysis job.
  4. When the job is complete, query the results of the analysis.

Read the Amazon Comprehend documentation to learn how you can get started with asynchronous processing operations.

About the Author

Binny Peh is a Sr. Product Marketing Manager for AWS machine learning solutions. In her spare time, she indulges in too much television and is an aspiring foodie. Binny’s glass is always half-full, and she believes in the power of positive thinking.