AWS News Blog
New business metadata features in Amazon SageMaker Catalog to improve discoverability across organizations
|
|
Amazon SageMaker Catalog, which is now built in to Amazon SageMaker, can help you collect and organize your data with the accompanying business context people need to understand it. It automatically documents assets generated by AWS Glue and Amazon Redshift, and it connects directly with Amazon Quick Sight, Amazon Simple Storage Service (Amazon S3) buckets, Amazon S3 Tables, and AWS Glue Data Catalog (GDC).
With only a few clicks, you can curate data inventory assets with the required business metadata by adding or updating business names (asset and schema), descriptions (asset and schema), read me, glossary terms (asset and schema), and metadata forms. You can also create AI-generated suggestions, review and refine descriptions, and publish enriched asset metadata directly to the catalog. This helps reduce manual documentation effort, improves metadata consistency, and accelerates asset discoverability across organizations.
Starting today, you can use new capabilities in Amazon SageMaker Catalog metadata to improve business metadata and search:
- Column-level metadata forms and rich descriptions – You can create custom metadata forms to capture business-specific information directly in individual columns. Columns also support markdown-enabled rich text descriptions for comprehensive data documentation and business context.
- Enforce metadata rules for glossary terms for asset publishing – You can use metadata enforcement rules for glossary terms, meaning data producers must use approved business vocabulary when publishing assets. By standardizing metadata practices, your organization can improve compliance, enhance audit readiness, and streamline access workflows for greater efficiency and control.
These new SageMaker Catalog metadata capabilities help address consistent data classification and improve discoverability across your organizational catalogs. Let’s take a closer look at each capability.
Column-level metadata forms and rich descriptions
You can now use custom metadata forms and rich text descriptions at the column level, extending existing curation capabilities for business names, descriptions, and glossary term classifications. Custom metadata form field values and rich text content are indexed in real time and become immediately discoverable through search.
To edit column-level metadata, select the schema of your catalog asset used in your project and choose the View/Edit action for each column.

When you choose one of the columns as an asset owner, you can define custom key-value metadata forms and markdown descriptions to provide detailed column documentation.

Now data analysts in your organization can search using custom form field values and rich text content, alongside existing column names, descriptions, and glossary terms.
Enforce metadata rules for glossary terms for asset publishing
You can define mandatory glossary term requirements for data assets during the publishing workflow. Your data producers must now classify their assets with approved business terms from organizational glossaries before publication, promoting consistent metadata standards and improving data discoverability. The enforcement rules validate that required glossary terms are applied, preventing assets from being published without proper business context.
To enable a new metadata rule for glossary terms, choose Add in your domain units under the Domain Management section in the Govern menu.

Now you can select either Metadata forms or Glossary association as a type of requirement for the rule. When you select Glossary association, you can choose up to 5 required glossary terms per rule.

If you attempt to publish assets without adding the required glossary terms, the error message prompting you to enforce the glossary rule appears.

Standardizing metadata and aligning data schemas with business language enhances data governance and improves search relevance, helping your organization better understand and trust published data.
You can use AWS Command Line Interface (AWS CLI) and AWS SDKs to use these features. To learn more, visit the Amazon SageMaker Unified Studio data catalog in the Amazon SageMaker Unified Studio User Guide.
Now available
The new metadata capabilities are now available in AWS Regions where Amazon SageMaker Catalog is available.
Give it a try and send feedback to AWS re:Post for Amazon SageMaker Catalog or through your usual AWS Support contacts.
— Channy
