AWS Big Data Blog

Enhanced data discovery in Amazon SageMaker Catalog with custom metadata forms and rich text documentation

Amazon SageMaker Catalog now supports custom metadata forms and rich text descriptions at the column level, extending existing curation capabilities for business names, descriptions, and glossary term classifications.

With these new features, data stewards can define and capture business-specific metadata directly in individual columns, and authors can use markdown-enabled rich text to provide detailed documentation and business context. Both form fields and formatted descriptions are indexed in real time, making them immediately discoverable through catalog search.

Column-level context is essential for understanding and trusting data. This release helps organizations improve data discoverability, collaboration, and governance by letting metadata stewards document columns using structured and formatted information that aligns with internal standards.

In this post, we show how to enhance data discovery in SageMaker Catalog with custom metadata forms and rich text documentation at the schema level.

Key capabilities

SageMaker Catalog now offers the following key capabilities:

  • Custom metadata forms – Data stewards can now use custom metadata forms to capture organization-specific metadata fields for columns such as Business Owner, Regulatory Classification, Units of Measure, or Approved Use Case. Each field is stored as a key-value pair and indexed for search, enabling business-level queries like “find columns where sensitivity = confidential.”
  • Rich text (markdown) descriptions – Each column supports a markdown-enabled description field. Authors can format text with headings, bullet lists, and hyperlinks to add deeper business or operational context—for example, logic definitions, sample values, or data lineage references.
  • Real-time indexing for search – Custom form values and rich text content are indexed as soon as they are saved. Users can search using a metadata value, keyword, or glossary term across columns.

Solution overview

For this post, we explore a financial services use case. Our example financial services organization defines a column metadata form that includes several fields, as illustrated in the following table.

Field Example Value
Approved Use Case Financial revenue modeling
Business Owner Finance Office
Domain RF

For a dataset column named revenue, the author adds the following markdown description:

# Business Revenue

- Use for Financial Modeling
- Use only for batch use cases

When analysts search for Domain = RF, this column appears in results with complete business context.

In the following sections, we demonstrate how to use to use metadata forms for columns and add rich text descriptions that is searchable.

Prerequisites

To test this solution, you should have an Amazon SageMaker Unified Studio domain set up with a domain owner or domain unit owner privileges. You should also have an existing project to publish assets and catalog assets. For instructions to create these assets, see the Getting started guide.

In this example, we created a project named financial_analysis and a test table. To create a similar table, see Get started with Amazon S3 Tables in Amazon SageMaker Unified Studio. To ingest the sample data to SageMaker Catalog and generate business metadata, see Create an Amazon SageMaker Unified Studio data source for Amazon Redshift in the project catalog.

Create new metadata form

Complete the following steps to create a new metadata form:

  1. In SageMaker Unified Studio, go to your project.
  2. Under Project catalog in the navigation pane, choose Metadata entities.
  3. Choose Create metadata form.
  4. Provide an optional display name, a technical name, and an optional description, then choose Create metadata form.
  5. Define the form fields. In this example, we add the fields Domain, Business Owner, and Approved Use Case.
  6. For Requirement Options, select the configuration for each field. For our use case, we select Always required.
  7. Choose Create field.
  8. Turn on Enabled so the form is visible and can be used for assets.

Attach metadata form to column

Complete the following steps to attach the metadata form to a column:

  1. Under Project catalog in the navigation pane, choose Assets.
  2. Search for and select your asset (for this example, we use the asset business_finance).
  3. On the Schema tab, choose View/Edit next to the revenue field.
  4. Choose Add metadata form.
  5. Choose the form you created and choose Add.
  6. Add details for the metadata form fields

Add additional context as formatted text

Next, we enter a rich text description for each column using the markdown editor, including headings, bullet lists, links, and sample values. Complete the following steps:

  1. Choose Edit next to README for the revenue field where you added the metadata form.
  2. Enter details and choose Save.
  3. Choose Preview to view the formatted README at the column level.

Publish and verify search

Now you’re ready to publish the asset. The metadata form values and markdown descriptions become part of the catalog record and are indexed for search. You can also see the history of revisions on the History tab. Other project users can see the metadata form and rich text description for the published assets and subscribe to the data asset. You can create more data products with these assets, and they will also have the column metadata form and README.

In the catalog search UI, data users can now filter on custom form fields (for example, “Domain = RF”) or search in natural language for text that matches the column description.

Best practices

Consider the following best practices when using this feature:

  • Define metadata forms aligned with your business vocabulary (domains, owners, sensitivity levels) proactively before publishing assets at scale.
  • Make column descriptions actionable—include business definitions, value ranges, logic, update cadence, and dependencies.
  • Verify the catalog indexing is timely; publish changes proactively so search results reflect new metadata.
  • Use governance controls. You can combine column-level metadata with existing asset-level templates and approval workflows to enforce publishing standards.
  • Monitor search usage and metadata completeness; target high-value datasets for complete column-level documentation first.
  • Do not store confidential or sensitive information in your metadata forms.

Conclusion

With column-level metadata forms and rich text descriptions, SageMaker Catalog helps organizations deliver higher-quality metadata, stronger governance, and better data discovery. These features make it straightforward for teams to capture complete business context and for analysts to quickly locate and understand the data they need.

Custom metadata forms and rich text descriptions at the column level are now available in AWS Regions where SageMaker is supported.

To learn more about SageMaker, see the Amazon SageMaker User Guide. Get started with this capability, refer to the user guide.


About the Authors

Ramesh Singh

Ramesh Singh

Ramesh is a Senior Product Manager Technical (External Services) at AWS in Seattle, Washington, currently with the Amazon SageMaker team. He is passionate about building high-performance ML/AI and analytics products that enable enterprise customers to achieve their critical goals using cutting-edge technology.

Pradeep Misra

Pradeep Misra

Pradeep is a Principal Analytics and Applied AI Solutions Architect at AWS. He is passionate about solving customer challenges using data, analytics, and AI/ML. Outside of work, he likes exploring new places, trying new cuisines, and playing badminton with his family. He also likes doing science experiments, building LEGOs, and watching anime with his daughters.

Abbas Makhdum

Abbas Makhdum

Abbas is Head of Product Marketing for Amazon SageMaker Catalog at AWS, where he leads go-to-market strategy and launches for data and AI governance solutions. With deep expertise across data, AI, and analytics, Abbas has also authored a book on data and AI governance with O’Reilly. He is passionate about helping organizations unlock business value by making data and AI more accessible, transparent, and governed.

Harish Panwar

Harish Panwar

Harish is a Software Development Manager at AWS in Bangalore, India. He is leading the Catalog engineering team, which is building data and AI governance solutions. Harish is a veteran in Amazon SageMaker, with deep expertise across SageMaker AI and SageMaker Catalog. He is passionate about creating simple and intuitive AI solutions making AI accessible to everyone.