Posted On: Aug 9, 2021

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to analyze text documents and identify insights such as sentiment, entities, and topics in text. Today, we are updating our custom entity recognition models so that you can train models with fewer training documents. Custom entity recognition extends the capability of Amazon Comprehend by enabling you to identify new entity types not supported as one of the preset generic entity types. This means that in addition to identifying the entity types from the Detect Entities API such as LOCATION or DATE, PERSON, you can analyze documents and extract entities like PRODUCT_CODE, EMPLOYEE_ID, CONTRACTOR_NAME or business-specific entities that you define and that fit your particular needs. Starting today, we have reduced the minimum required training documents by 50%. This means, you can train custom models with as few as 100 annotations per entity type from 250 documents. If you have more training documents, you can even expect to get better results from the newer models than before!

The updated custom entity recognition models are available in all AWS Regions where Amazon Comprehend is available. To try the new feature, log in to the Amazon Comprehend console for a code-free experience, or download the AWS SDK. You can also learn more about this new feature in our blog.