AWS Big Data Blog

Accelerate data governance with custom subscription workflows in Amazon SageMaker

Amazon SageMaker provides a single data and AI development environment to discover and build with your data. This unified platform integrates functionality from existing AWS Analytics and Artificial Intelligence and Machine Learning (AI/ML) services, including Amazon EMR, AWS Glue, Amazon Athena, Amazon Redshift, and Amazon Bedrock.

Organizations need to efficiently manage data assets while maintaining governance controls in their data marketplaces. Although manual approval workflows remain important for sensitive datasets and production systems, there’s an increasing need for automated approval processes with less sensitive datasets. In this post, we show you how to automate subscription request approvals within SageMaker, accelerating data access for data consumers.

Prerequisites

For this walkthrough, you must have the following prerequisites:

  • An AWS account – If you don’t have an account, you can create one. The account should have permission to do the following:
    • Create and manage SageMaker domains
    • Create and manage IAM roles
    • Create and invoke Lambda functions
  • SageMaker domain – For instructions to create a domain, refer to Create an Amazon SageMaker Unified Studio domain – quick setup.
  • A demo project – Create a demo project in your SageMaker domain. For instructions, see Create a project. For this example, we choose All capabilities in the project profile section.
  • SageMaker domain ID, project ID, and project role ARN – These will be used in later steps to provide permissions for existing datasets and resources, and automatic subscription approval code. To retrieve this information, go to the Project details tab on the project details page on the SageMaker console.
  • AWS CLI installed – You must have the AWS Command Line Interface (AWS CLI) version 2.11 or later.
  • Python installed – You must have Python version 3.8 or later.
  • IAM permissions – Sign in as the user with administrative access
  • Lambda permissions – Configure the appropriate IAM permissions for the Lambda execution role. The following code is a sample role used for testing this solution. Before implementing this IAM policy in your environment, provide the values for your specific AWS Region and account ID. Adjust them based on the principle of least privilege. To learn more about creating Lambda execution roles, refer to Defining Lambda function permissions with an execution role.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "datazone:ListSubscriptionRequests",
                    "datazone:AcceptSubscriptionRequest",
                    "datazone:GetSubscriptionRequestDetails",
                    "datazone:GetDomain",
                    "datazone:ListProjects"
                ],
                "Resource": "<<Domain-ARN>>"
            },
            {
                "Effect": "Allow",
                "Action": "sts:AssumeRole",
                "Resource": "<<Domain-ARN>>",
                "Condition": {
                    "StringEquals": {
                        "aws:PrincipalArn": "<<Lambda ARN>>"
                    }
                }
            },
            {
                "Effect": "Allow",
                "Action": "sns:Publish",
                "Resource": "<<SNS-ARN>>"
            },
            {
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogGroup",
                    "logs:CreateLogStream",
                    "logs:PutLogEvents"
                ],
                "Resource": [
                    "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*",
                    "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*:*"
                ]
            }
        ]
    }

Solution overview

Understanding the subscription and approval workflow in Amazon SageMaker is important before diving deep into custom workflow solution. After an asset is published to the SageMaker catalog, data consumers can discover assets. When a data consumer discovers assets in SageMaker catalog, they request access to the asset, by submitting a subscription request with business justification and intended use case. The request enters a pending state and notifies the data producer or asset owner for review. The data producer evaluates the request based on governance policies, consumer credentials, and business context. The data producer can accept, reject, or request additional information from the data consumer. Upon acceptance, SageMaker triggers the AcceptSubscriptionRequest event and begins automated access provisioning. After a subscription is accepted, a subscription fulfilment process gets kicked off to facilitate access to the asset, for the data producer. SageMaker integrates deeply with AWS Lake Formation to manage fine-grained permissions. When a subscription is approved, SageMaker automatically calls Lake Formation APIs to grant specific database, table, and column-level permissions to the subscriber’s IAM role. Lake Formation acts as the central permission engine, translating subscription approvals into actual data access rights without manual intervention. The system provisions and updates resource-based policies on data sources. Once the provisioning completes, the data consumer can immediately access subscribed data through query engines like Athena, Redshift, or EMR, with Lake Formation enforcing permissions at query time.

By default, subscription requests to a published asset require manual approval by a data owner. However, Amazon SageMaker supports automatic approval of subscription requests at asset level: when publishing a data asset, you can choose to not require subscription approval. In this case, all incoming subscription requests to that asset are automatically approved. Let’s first outline the step-by-step process for disabling automatic approval at the asset level.

Configure automatic approval at asset level:

To configure automatic approval, data producers can follow the steps below.

  1. Log in to SageMaker Unified Studio portal as data producer. Navigate to Assets and select the target asset
  2. Choose Assets → Pick the asset, which you would like to configure for automatic approval.
  3. On the asset details page, locate Edit Subscription settings in the right pane.
  4. Choose Edit next to Subscription Required
    1. Select Not Required in the dialogue box
    2. Confirm your selection

Customize SageMaker’s subscription workflow:

While manual approval workflow remains essential for production environments and sensitive data handling, organizations seek to streamline and automate approvals for lower-risk environments and non-sensitive datasets. To achieve this project-level automation, we can enhance SageMaker’s native approval workflow through a custom event-driven solution. This solution leverages AWS’s serverless architecture, combining using AWS Lambda, Amazon EventBridge rules, and Amazon Simple Notification Service (Amazon SNS) to create an automated approval workflow. This customization allows organizations to maintain governance while reducing administrative overhead and accelerating the development cycle in non-critical environments. The event-driven approach ensures real-time processing of approval requests, maintains audit trails, and can be configured to apply different approval rules based on project characteristics and data sensitivity levels.

The custom workflow consists of the following steps:

  1. The data consumer submits a subscription request for a published data asset.
  2. SageMaker detects the request and generates a subscription event, which is automatically sent to EventBridge.
  3. EventBridge triggers the designated Lambda function.
  4. The Lambda function sends an AcceptSubscriptionRequest API call to SageMaker.
  5. The function also sends a notification through Amazon SNS.
  6. AWS Lake Formation processes the approved subscription and updates the relevant access control lists (ACLs) and permission sets.
  7. Lake Formation grants access permissions to the data consumer’s project AWS Identity and Access Management (IAM) role.
  8. The data consumer now has authorized access to the requested data asset and can begin working with the subscribed data.

The following diagram illustrates the high-level architecture of the solution.

Key benefits

This solution uses AWS Lambda and Amazon EventBridge to automate SageMaker subscription requests approvals, delivering the following benefits for organizations and end-users:

  • Scalability – Automatically handles high volumes of subscription requests
  • Cost-efficiency – Pay-as-you-go approach with no idle resource costs
  • Minimal maintenance – Serverless components require no infrastructure management
  • Flexible triggering – Supports event-driven, scheduled, and manual invocation modes
  • Audit compliance – Comprehensive logging and traceability through AWS CloudTrail

Step-by-step procedure

This section outlines the detailed process for implementing a custom subscription request approval workflow in Amazon SageMaker

Create Lambda function

Complete the following steps to create your Lambda function:

  1. On the Lambda console, choose Functions in the navigation pane.
  2. Choose Create function.
  3. Select Author from scratch.
  4. For Function name, enter a name for the function.
  5. For Runtime, choose your runtime (for this post, we use Python version 3.9 or later).
  6. Choose Create function.
  7. On the Lambda function page, choose the Configuration tab and then choose Permissions.
  8. Note the execution role to use when configuring the SageMaker project.

Create SNS topic

For this solution, we create SNS topic. Complete the following steps to create the SNS topic for automatic approvals:

  1. On the Amazon SNS console, choose Topics in the navigation pane.
  2. Choose Create topic.
  3. For Type, select Standard.
  4. For Name, enter a name for the topic.
  5. Choose Create topic.
  6. On the SNS topic details page, note the SNS topic Amazon Resource Name (ARN) to use later in the Lambda function.
  7. On Subscription tab, choose Create Subscription.
  8. For Protocol, choose Email.
  9. For Endpoint, enter email address of Data consumers.

Create EventBridge rule

Complete the following steps to create an EventBridge rule to capture subscription request events:

  1. On the EventBridge console, choose Rules in the navigation pane.
  2. Choose Create rule.
  3. For Name, enter a name for the rule.
  4. For Rule type, select Rule with event pattern.
    This option enables the automatic subscription approval workflow to be triggered when a subscription request is initiated. Alternatively, you can select Schedule to schedule the rule to trigger on a regular basis. Refer to Creating a rule that runs on a schedule in Amazon EventBridge to learn more.
  5. Choose Next.
  6. For Event source, select AWS events or EventBridge partner events.
  7. For Creation method, select Use pattern form
  8. For Event source, select AWS services
  9. For AWS service, select DataZone.
  10. For Event type, select Subscription Request Created.
  11. Configure your target to route events to both the Lambda function and SNS topic.
  12. Choose Next.
  13. For this post, skip configuring tags and choose Next.
  14. Review the settings and choose Create rule.

Configure automation workflow

Complete the following steps to configure the automation workflow:

  1. On the Lambda console, go to the function you created.
  2. Configure the EventBridge rule to trigger the Lambda function
  3. Configure the destination as SNS topic for event notification.

Configure code in Lambda function

Complete the following steps to configure your Lambda function:

  1. On the Lambda console, go to the function you created.
  2. Add the following code to your function. Provide the domain ID, project ID, and SNS topic ARN that you noted earlier.
    import boto3
    import json
    import logging
    import os
    from botocore.exceptions import ClientError
    
    # Configure logging
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    def lambda_handler(event, context):
        """Lambda function to auto-approve subscription requests in Amazon SageMaker"""
        try:
            # Initialize clients
            datazone_client = boto3.client('datazone')
            sns_client = boto3.client('sns')
            
            # Get configuration from environment variables or use hardcoded values
            domain_id = os.environ.get('DOMAIN_ID', '<domain_id>')
            project_id = os.environ.get('PROJECT_ID', '<project_id>')
            sns_topic_arn = os.environ.get('SNS_TOPIC_ARN', '<sns_topic_arn>')
            
            # Get pending subscription requests
            pending_requests = get_pending_requests(datazone_client, domain_id, project_id)
            
            if not pending_requests:
                logger.info("No pending subscription requests found")
                return
            
            # Process requests
            for request in pending_requests:
                approve_request(datazone_client, sns_client, domain_id, request, sns_topic_arn)
                
        except Exception as e:
            logger.error(f"Error: {str(e)}")
    
    def get_pending_requests(client, domain_id, project_id):
        """Get all pending subscription requests"""
        requests = []
        next_token = None
        
        try:
            while True:
                params = {
                    'domainIdentifier': domain_id,
                    'status': 'PENDING',
                    'approverProjectId': project_id
                }
                
                if next_token:
                    params['nextToken'] = next_token
                
                response = client.list_subscription_requests(**params)
                
                if 'items' in response:
                    requests.extend(response['items'])
                
                next_token = response.get('nextToken')
                if not next_token:
                    break
                    
            logger.info(f"Found {len(requests)} pending requests")
            return requests
            
        except ClientError as e:
            logger.error(f"Error listing requests: {e}")
            return []
    
    def approve_request(datazone_client, sns_client, domain_id, request, sns_topic_arn):
        """Approve a subscription request and send notification"""
        request_id = request.get('id')
        if not request_id:
            return
            
        try:
            # Approve the request
            datazone_client.accept_subscription_request(
                domainIdentifier=domain_id,
                identifier=request_id,
                decisionComment="Subscription request is auto-approved by Lambda"
            )
            
            # Send notification
            asset_name = request.get('assetName', 'Unknown asset')
            
            message = f"Your subscription request has been auto-approved by Lambda. You can now access this asset."
            
            sns_client.publish(
                TopicArn=sns_topic_arn,
                Subject=f"Subscription Request is auto-approved by Lambda",
                Message=message
            )
            
            logger.info(f"Approved request {request_id} for {asset_name}")
            
        except Exception as e:
            logger.error(f"Error processing request {request_id}: {e}")
  3. Choose Test to test the Lambda function code. To learn more about testing Lambda code, refer to Testing Lambda functions in the console.
  4. Choose Deploy to deploy the code.

Configure Lambda and project execution roles in SageMaker

Complete the following steps:

  1. In SageMaker Unified Studio, go to your publishing project.
  2. Choose Members in the navigation pane.
  3. Choose Add members.
  4. Add the Lambda execution role and project execution roles as Contributor.

Test the solution

Complete the following steps to test the solution:

  1. In SageMaker Unified Studio, navigate to the data catalog and choose Subscribe on the configured asset to initiate a subscription request.
  2. Choose Subscription requests in the navigation pane to view the outgoing requests and choose the Approved tab to verify automatic approval.
  3. Choose View subscription to confirm the approver appears as the Lambda execution role with “Auto-approved by Lambda” as the reason.
  4. On the CloudTrail console, choose Event history to view the event you created and review the automated approval audit trail.

Clean up

To avoid incurring future charges, clean up the resources you created during this walkthrough. The following steps use the AWS Management Console, but you can also use the AWS CLI.

  1. Delete the SageMaker domain. To use the AWS CLI, run the following commands:
    aws sagemaker delete-project --project-name <project-name>
    aws datazone delete-domain –identifier <domain_identifier>
  2. Delete the SNS topics. To use the AWS CLI, run the following command:
    aws sns delete-topic --topic-arn <topic-arn>
  3. Delete the Lambda function. To use the AWS CLI, run the following command:
    aws lambda delete-function --function-name <Lambda function name>

Conclusion

Combining an event-driven architecture with SageMaker creates an automated, cost-effective solution for data governance challenges. This serverless approach automatically handles data access requests while maintaining compliance, so organizations can scale efficiently as their data grows. The solution discussed in this post can help data teams access insights faster with minimal operational costs, making it an excellent choice for businesses that need quick, compliant data access while keeping their systems lean and efficient.

To learn more, visit the Amazon SageMaker Unified Studio page.


About the authors

Nira Jaiswal

Nira Jaiswal

Nira is a Principal Data Solutions Architect at AWS. Nira works with strategic customers to architect and deploy innovative data and analytics solutions. She excels at designing scalable, cloud-based platforms that help organizations maximize the value of their data investments. Nira is passionate about combining analytics, AI/ML, and storytelling to transform complex information into actionable insights that deliver measurable business value.

Ajit Tandale

Ajit Tandale

Ajit is a Senior Solutions Architect at AWS, specializing in data and analytics. He partners with strategic customers to architect secure, scalable data systems using AWS services and open-source technologies. His expertise includes designing data lakes, implementing data pipelines, and optimizing big data processing workflows to help organizations modernize their data architecture. Outside of work, he’s an avid reader and science fiction movie enthusiast.