Build ML feature pipelines from custom data sources with Amazon SageMaker Feature Store

Posted on: Oct 23, 2023

Amazon SageMaker Feature Store supports the ability to incorporate custom data sources into feature processing pipelines. You can build richer and more varied ML features by incorporating diverse data sources, defining transformation functions to perform, and SageMaker Feature Store takes care of processing the data into ML features.

With this launch, you can connect to streaming data sources like Amazon Kinesis and author transforms with Spark Structured Streaming, which is a scalable and fault-tolerant stream processing engine for real-time data processing. You can also connect to data warehouses like Amazon Redshift, Snowflake and Databricks for batch feature processing, and initiate feature processing on a schedule or with a trigger using Amazon EventBridge rules. Amazon SageMaker Feature Store creates and manages the pipelines and writes to your feature groups for use in ML model serving and training. You can track your pipeline executions, visualize lineage to trace features back to data sources, and view feature processing code, all in one environment in Amazon SageMaker Studio.

To learn more, please view the documentation here. To get started, go to SageMaker Studio from the Amazon SageMaker console.

Build ML feature pipelines from custom data sources with Amazon SageMaker Feature Store

Learn

Resources

Developers

Help