AWS Machine Learning Blog
Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access
Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. For example, in an application that recommends a music playlist, features could include song ratings, listening duration, and listener demographics. Features are used repeatedly by multiple teams, and feature quality is critical to ensure a highly accurate model. Also, when features used to train models offline in batch are made available for real-time inference, it’s hard to keep the two feature stores synchronized. SageMaker Feature Store provides a secured and unified store to process, standardize, and use features at scale across the ML lifecycle.
SageMaker Feature Store now makes it effortless to share, discover, and access feature groups across AWS accounts. This new capability promotes collaboration and minimizes duplicate work for teams involved in ML model and application development, particularly in enterprise environments with multiple accounts spanning different business units or functions.
With this launch, account owners can grant access to select feature groups by other accounts using AWS Resource Access Manager (AWS RAM). After they’re granted access, users of those accounts can conveniently view all of their feature groups, including the shared ones, through Amazon SageMaker Studio or SDKs. This enables teams to discover and utilize features developed by other teams, fostering knowledge sharing and efficiency. Additionally, usage details of shared resources can be monitored with Amazon CloudWatch and AWS CloudTrail. For a deep dive, refer to Cross account feature group discoverability and access.
In this post, we discuss the why and how of a centralized feature store with cross-account access. We show how to set it up and run a sample demonstration, as well as the benefits you can get by using this new capability in your organization.
Who needs a cross-account feature store
Organizations need to securely share features across teams to build accurate ML models, while preventing unauthorized access to sensitive data. SageMaker Feature Store now allows granular sharing of features across accounts via AWS RAM, enabling collaborative model development with governance.
SageMaker Feature Store provides purpose-built storage and management for ML features used during training and inferencing. With cross-account support, you can now selectively share features stored in one AWS account with other accounts in your organization.
For example, the analytics team may curate features like customer profile, transaction history, and product catalogs in a central management account. These need to be securely accessed by ML developers in other departments like marketing, fraud detection, and so on to build models.
The following are key benefits of sharing ML features across accounts:
- Consistent and reusable features – Centralized sharing of curated features improves model accuracy by providing consistent input data to train on. Teams can discover and directly consume features created by others instead of duplicating them in each account.
- Feature group access control – You can grant access to only the specific feature groups required for an account’s use case. For example, the marketing team may only get access to the customer profile feature group needed for recommendation models.
- Collaboration across teams – Shared features allow disparate teams like fraud, marketing, and sales to collaborate on building ML models using the same reliable data instead of creating siloed features.
- Audit trail for compliance – Administrators can monitor feature usage by all accounts centrally using CloudTrail event logs. This provides an audit trail required for governance and compliance.
Delineating producers from consumers in cross-account feature stores
In the realm of machine learning, the feature store acts as a crucial bridge, connecting those who supply data with those who harness it. This dichotomy can be effectively managed using a cross-account setup for the feature store. Let’s demystify this using the following personas and a real-world analogy:
- Data and ML engineers (owners and producers) – They lay the groundwork by feeding data into the feature store
- Data scientists (consumers) – They extract and utilize this data to craft their models
Data engineers serve as architects sketching the initial blueprint. Their task is to construct and oversee efficient data pipelines. Drawing data from source systems, they mold raw data attributes into discernable features. Take “age” for instance. Although it merely represents the span between now and one’s birthdate, its interpretation might vary across an organization. Ensuring quality, uniformity, and consistency is paramount here. Their aim is to feed data into a centralized feature store, establishing it as the undisputed reference point.
ML engineers refine these foundational features, tailoring them for mature ML workflows. In the context of banking, they might deduce statistical insights from account balances, identifying trends and flow patterns. The hurdle they often face is redundancy. It’s common to see repetitive feature creation pipelines across diverse ML initiatives.
Imagine data scientists as gourmet chefs scouting a well-stocked pantry, seeking the best ingredients for their next culinary masterpiece. Their time should be invested in crafting innovative data recipes, not in reassembling the pantry. The hurdle at this juncture is discovering the right data. A user-friendly interface, equipped with efficient search tools and comprehensive feature descriptions, is indispensable.
In essence, a cross-account feature store setup meticulously segments the roles of data producers and consumers, ensuring efficiency, clarity, and innovation. Whether you’re laying the foundation or building atop it, knowing your role and tools is pivotal.
The following diagram shows two different data scientist teams, from two different AWS accounts, who share and use the same central feature store to select the best features needed to build their ML models. The central feature store is located in a different account managed by data engineers and ML engineers, where the data governance layer and data lake are usually situated.
Cross-account feature group controls
With SageMaker Feature Store, you can share feature group resources across accounts. The resource owner account shares resources with the resource consumer accounts. There are two distinct categories of permissions associated with sharing resources:
- Discoverability permissions – Discoverability means being able to see feature group names and metadata. When you grant discoverability permission, all feature group entities in the account that you share from (resource owner account) become discoverable by the accounts that you are sharing with (resource consumer accounts). For example, if you make the resource owner account discoverable by the resource consumer account, then principals of the resource consumer account can see all feature groups contained in the resource owner account. This permission is granted to resource consumer accounts by using the SageMaker catalog resource type.
- Access permissions – When you grant an access permission, you do so at the feature group resource level (not the account level). This gives you more granular control over granting access to data. The type of access permissions that can be granted are read-only, read/write, and admin. For example, you can select only certain feature groups from the resource owner account to be accessible by principals of the resource consumer account, depending on your business needs. This permission is granted to resource consumer accounts by using the feature group resource type and specifying feature group entities.
The following example diagram visualizes sharing the SageMaker catalog resource type granting the discoverability permission vs. sharing a feature group resource type entity with access permissions. The SageMaker catalog contains all of your feature group entities. When granted a discoverability permission, the resource consumer account can search and discover all feature group entities within the resource owner account. A feature group entity contains your ML data. When granted an access permission, the resource consumer account can access the feature group data, with access determined by the relevant access permission.
Solution overview
Complete the following steps to securely share features between accounts using SageMaker Feature Store:
- In the source (owner) account, ingest datasets and prepare normalized features. Organize related features into logical groups called feature groups.
- Create a resource share to grant cross-account access to specific feature groups. Define allowed actions like get and put, and restrict access only to authorized accounts.
- In the target (consumer) accounts, accept the AWS RAM invitation to access shared features. Review the access policy to understand permissions granted.
Developers in target accounts can now retrieve shared features using the SageMaker SDK, join with additional data, and use them to train ML models. The source account can monitor access to shared features by all accounts using CloudTrail event logs. Audit logs provide centralized visibility into feature usage.
With these steps, you can enable teams across your organization to securely use shared ML features for collaborative model development.
Prerequisites
We assume that you have already created feature groups and ingested the corresponding features inside your owner account. For more information about getting started, refer to Get started with Amazon SageMaker Feature Store.
Grant discoverability permissions
First, we demonstrate how to share our SageMaker Feature Store catalog in the owner account. Complete the following steps:
- In the owner account of the SageMaker Feature Store catalog, open the AWS RAM console.
- Under Shared by me in the navigation pane, choose Resource shares.
- Choose Create resource share.
- Enter a resource share name and choose SageMaker Resource Catalogs as the resource type.
- Choose Next.
- For discoverability-only access, enter
AWSRAMPermissionSageMakerCatalogResourceSearch
for Managed permissions.
- Choose Next.
- Enter your consumer account ID and choose Add. You may add several consumer accounts.
- Choose Next and complete your resource share.
Now the shared SageMaker Feature Store catalog should show up on the Resource shares page.
You can achieve the same result by using the AWS Command Line Interface (AWS CLI) with the following command (provide your AWS Region, owner account ID, and consumer account ID):
Accept the resource share invite
To accept the resource share invite, complete the following steps:
- In the target (consumer) account, open the AWS RAM console.
- Under Shared with me in the navigation pane, choose Resource shares.
- Choose the new pending resource share.
- Choose Accept resource share.
You can achieve the same result using the AWS CLI with the following command:
From the output of preceding command, retrieve the value of resourceShareInvitationArn
and then accept the invitation with the following command:
The workflow is the same for sharing feature groups with another account via AWS RAM.
After you share some feature groups with the target account, you can inspect the SageMaker Feature Store, where you can observe that the new catalog is available.
Grant access permissions
With access permissions, we can grant permissions at the feature group resource level. Complete the following steps:
- In the owner account of the SageMaker Feature Store catalog, open the AWS RAM console.
- Under Shared by me in the navigation pane, choose Resource shares.
- Choose Create resource share.
- Enter a resource share name and choose SageMaker Feature Groups as the resource type.
- Select one or more feature groups to share.
- Choose Next.
- For read/write access, enter
AWSRAMPermissionSageMakerFeatureGroupReadWrite
for Managed permissions.
- Choose Next.
- Enter your consumer account ID and choose Add. You may add several consumer accounts.
- Choose Next and complete your resource share.
Now the shared catalog should show up on the Resource shares page.
You can achieve the same result by using the AWS CLI with the following command (provide your Region, owner account ID, consumer account ID, and feature group name):
There are three types of access that you can grant to feature groups:
- AWSRAMPermissionSageMakerFeatureGroupReadOnly – The read-only privilege allows resource consumer accounts to read records in the shared feature groups and view details and metadata
- AWSRAMPermissionSageMakerFeatureGroupReadWrite – The read/write privilege allows resource consumer accounts to write records to, and delete records from, the shared feature groups, in addition to read permissions
- AWSRAMPermissionSagemakerFeatureGroupAdmin – The admin privilege allows the resource consumer accounts to update the description and parameters of features within the shared feature groups and update the configuration of the shared feature groups, in addition to read/write permissions
Accept the resource share invite
To accept the resource share invite, complete the following steps:
- In the target (consumer) account, open the AWS RAM console.
- Under Shared with me in the navigation pane, choose Resource shares.
- Choose the new pending resource share.
- Choose Accept resource share.
The process of accepting the resource share using the AWS CLI is the same as for the previous discoverability section, with the get-resource-share-invitations and accept-resource-share-invitation commands.
Sample notebooks showcasing this new capability
Two notebooks were added to the SageMaker Feature Store Workshop GitHub repository in the folder 09-module-security/09-03-cross-account-access:
- m9_03_nb1_cross-account-admin.ipynb – This needs to be launched on your admin or owner AWS account
- m9_03_nb2_cross-account-consumer.ipynb – This needs to be launched on your consumer AWS account
The first script shows how to create the discoverability resource share for existing feature groups at the admin or owner account and share it with another consumer account programmatically using the AWS RAM API create_resource_share()
. It also shows how to grant access permissions to existing feature groups at the owner account and share these with another consumer account using AWS RAM. You need to provide your consumer AWS account ID before running the notebook.
The second script accepts the AWS RAM invitations to discover and access cross-account feature groups from the owner level. Then it shows how to discover cross-account feature groups that are on the owner account and list these on the consumer account. You can also see how to access in read/write cross-account feature groups that are on the owner account and perform the following operations from the consumer account: describe()
, get_record()
, ingest()
, and delete_record()
.
Conclusion
The SageMaker Feature Store cross-account capability offers several compelling benefits. Firstly, it facilitates seamless collaboration by enabling sharing of feature groups across multiple AWS accounts. This enhances data accessibility and utilization, allowing teams in different accounts to use shared features for their ML workflows.
Additionally, the cross-account capability enhances data governance and security. With controlled access and permissions through AWS RAM, organizations can maintain a centralized feature store while ensuring that each account has tailored access levels. This not only streamlines data management, but also strengthens security measures by limiting access to authorized users.
Furthermore, the ability to share feature groups across accounts simplifies the process of building and deploying ML models in a collaborative environment. It fosters a more integrated and efficient workflow, reducing redundancy in data storage and facilitating the creation of robust models with shared, high-quality features. Overall, the Feature Store’s cross-account capability optimizes collaboration, governance, and efficiency in ML development across diverse AWS accounts. Give it a try, and let us know what you think in the comments.
About the Authors
Ioan Catana is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He helps customers develop and scale their ML solutions in the AWS Cloud. Ioan has over 20 years of experience, mostly in software architecture design and cloud engineering.
Philipp Kaindl is a Senior Artificial Intelligence and Machine Learning Solutions Architect at AWS. With a background in data science and mechanical engineering, his focus is on empowering customers to create lasting business impact with the help of AI. Outside of work, Philipp enjoys tinkering with 3D printers, sailing, and hiking.
Dhaval Shah is a Senior Solutions Architect at AWS, specializing in machine learning. With a strong focus on digital native businesses, he empowers customers to use AWS and drive their business growth. As an ML enthusiast, Dhaval is driven by his passion for creating impactful solutions that bring positive change. In his leisure time, he indulges in his love for travel and cherishes quality moments with his family.
Mizanur Rahman is a Senior Software Engineer for Amazon SageMaker Feature Store with over 10 years of hands-on experience specializing in AI and ML. With a strong foundation in both theory and practical applications, he holds a Ph.D. in Fraud Detection using Machine Learning, reflecting his dedication to advancing the field. His expertise spans a broad spectrum, encompassing scalable architectures, distributed computing, big data analytics, micro services and cloud infrastructures for organizations.