Amazon Comprehend Now Supports Asynchronous Processing Along With Larger Document Sizes

Posted on: Jun 27, 2018

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. Starting today, customers have the option to analyze a collection of documents stored in an Amazon S3 bucket using the new asynchronous job service. This is in addition to the single and multiple document synchronous calls to the REST API already available, giving you a variety of options that best fit your applications’ needs.

Asynchronous operations are particularly useful for analyzing large data sets when the application doesn't need a real-time response from the service. For example, you can schedule text analytics batch runs daily, weekly, or even monthly. Using the asynchronous option, the service now accepts documents up to 100 KB in size, removing the need to truncate large documents for Named Entity Recognition (NER)and Keyphrase analysis. (Note: Maximum file size for asynchronous sentiment detection is still limited to 5 KB per document.)

With this release, Amazon Comprehend now provides the following operations:

  • Synchronous Single-Document Processing—You call Amazon Comprehend with a single document and receive a synchronous response.
  • Synchronous Multi-Document (25 per request) —You call Amazon Comprehend with a collection of up to 25 documents and receive a synchronous response.
  • Asynchronous Batch Processing (5 GB job size limit, 100 KB document size) — Use a collection of documents stored in an Amazon S3 bucket and start an asynchronous operation to analyze the documents. The results of the analysis are returned in an S3 bucket.  

Read the Amazon Comprehend documentation to learn how you can get started with asynchronous processing operations.