Posted On: Jul 25, 2023
Amazon SageMaker Canvas now supports Document Queries, a ready-to-use model powered by Amazon Textract. Document Queries allows you to specify the data you want to extract from structured documents using natural language, without requiring prior knowledge of the document’s structure (table, form, fields, nested data). This eliminates the need for manual processing and searching within extracted data, saving you time and reducing human-error.
SageMaker Canvas is a visual interface that enables business analysts to generate accurate ML predictions on their own — without requiring any machine learning experience or having to write a single line of code. Prior to this launch Canvas offered ready-to-use models that enabled you to extract information such as text, tables, and forms from documents. However, to answer ad-hoc questions (e.g, "What is the total revenue generated from sales in the third quarter?") you had to search and process the extracted information, which is inefficient and time-consuming. With Document Queries, you can specify the information you require by asking natural language questions (e.g., “What is the customer name”) and receive the precise information (e.g., ”John Doe”) along with its location within the document without writing a single line of code.
To get started, login to Amazon SageMaker Canvas and access the new Document Queries model available from the list of ready-to-use models. Simply upload your document, and using natural language questions obtain the answer you are looking for.
This new capability is available in all AWS regions where SageMaker Canvas is supported today. Amazon Textract’s pricing applies. To learn more, see the product documentation.