AWS Big Data Blog

Automate deployment of data and AI applications with Amazon SageMaker Unified Studio CI/CD CLI

Organizations building data and AI applications in Amazon SageMaker Unified Studio combine multiple AWS services, including AWS Glue, Amazon Athena, Amazon Managed Workflows for Apache Airflow (Amazon MWAA), Amazon SageMaker AI, and Amazon Quick Sight, into single applications. Promoting these applications from development to test and production stages requires substituting service-specific configurations for each stage and provisioning resources in the correct order.

Data teams understand which services their applications need but lack continuous integration and continuous delivery (CI/CD) expertise, while DevOps teams understand deployment automation but must learn each AWS service’s provisioning requirements.

The CI/CD CLI for Amazon SageMaker Unified Studio (aws-smus-cicd-cli) is an open source command line tool that automates deployment of multi-service data and AI applications across pipeline stages. Data teams define their application once in a YAML manifest, DevOps teams deploy with a single command, and the CLI handles configuration substitution, dependency ordering, and resource provisioning automatically. For details, see the CI/CD CLI documentation.

In this post, we walk through how the CI/CD CLI works, show you how to deploy a real application across environments, and demonstrate how it fits into your existing CI/CD workflows.

Customer spotlight

Bureau Veritas, a global leader in testing, inspection, and certification, operates across multiple SageMaker Unified Studio environments to support its data and AI teams. With their data and DevOps teams working on different parts of the application lifecycle, Bureau Veritas needed a controlled way to promote workloads from development through test to production while preserving clear ownership boundaries between the two teams.

“We need to promote data and AI applications across SageMaker Unified Studio environments in a controlled way that respects the boundaries between our data teams and our DevOps teams. The CI/CD CLI does exactly that — a single manifest from the data team, a single deploy command from DevOps, and full control over what goes to production.”

— Gilles Kempf, Architecture Manager, Bureau Veritas

How the CI/CD CLI works

The CI/CD CLI introduces a clean separation of concerns between data teams and DevOps teams.

Data teams define what to deploy in a declarative YAML manifest (manifest.yaml). The manifest describes the application’s resources, including AWS Glue extract, transform, and load (ETL) jobs, Athena queries, Airflow directed acyclic graphs (DAGs), Quick Sight dashboards, and SageMaker training jobs, along with stage-specific configurations for each environment.

DevOps teams define how and when to deploy using their existing CI/CD systems. They retain full control over their deployment methodology. They choose whether to promote content through git branches, a bundle artifactory, or both; they decide the shape of the pipeline, including which stages to include (dev, staging, pre-prod, prod) and which manual approvals or security gates are required. They run aws-smus-cicd-cli deploy inside GitHub Actions, Jenkins, or GitLab CI workflows without needing to understand which AWS services the application uses or how SageMaker Unified Studio projects are structured. The CLI is a utility for AWS analytics service deployment, not a CI/CD methodology. Your team’s existing conventions for branches, approvals, and pipeline shape stay exactly as they are.

The CLI is the abstraction layer between the two. It reads the manifest, substitutes stage-specific configurations (S3 paths, AWS Identity and Access Management (IAM) roles, account IDs, and connection strings), provisions resources in dependency order, and handles all AWS service interactions.The following diagram illustrates this separation:

SageMaker CI/CD

Key concepts

Application manifest

Each stage maps to a dedicated SageMaker Unified Studio project. This one-stage-to-one-project mapping is the foundation of CI/CD isolation: each project has its own domain, IAM boundaries, connections, and data, so changes in dev can never affect prod. For stronger isolation, projects can span different AWS accounts and AWS Regions. For example, dev in a sandbox account and prod in a production account in a different Region. Because each stage is a real SageMaker Unified Studio project, teams can open it in the console at any time to observe workflows, inspect resources, and troubleshoot deployments. Project membership is managed per project, so you control exactly who has access to each stage. For example, developers in dev and a release team in prod.The manifest file is the single source of truth for your application. It declares:

  • Content: application code from git repositories, data files from S3, Quick Sight dashboards, and workflow definitions.
  • Stages: environment-specific project mappings (dev, test, prod, etc.), each isolated as described earlier.
  • Configuration: stage-specific settings that are substituted automatically at deploy time.

Here is an example manifest for an analytics application with AWS Glue ETL and Quick Sight:
applicationName: SalesAnalyticsDashboard

content: 
  storage: 
    - name: etl-code 
      include: ["*.py"] 
    - name: workflows 
      include: ["*.yaml"] 
  quicksight: 
    - name: SalesDashboard 
      type: dashboard 
  workflows: 
    - workflowName: sales_etl_pipeline 
      connectionName: default.workflow_serverless 
 
stages: 
  dev: 
    domain: 
      region: us-east-1 
    project: 
      name: analytics-dev 
    deployment_configuration: 
      storage: 
        - name: etl-code 
          connectionName: default.s3_shared 
          targetDirectory: sales/bundle/etl 
        - name: workflows 
          connectionName: default.s3_shared 
          targetDirectory: sales/bundle/workflows 
 
  prod: 
    domain: 
      region: us-west-2 
    project: 
      name: analytics-prod 
    deployment_configuration: 
      storage: 
        - name: etl-code 
          connectionName: default.s3_shared 
          targetDirectory: sales/bundle/etl 
        - name: workflows 
          connectionName: default.s3_shared 
          targetDirectory: sales/bundle/workflows 
      quicksight: 
        assets: 
          - name: SalesDashboard 
            owners: 
              - arn:aws:quicksight:${AWS_REGION}:${AWS_ACCOUNT_ID}:user/default/Admin/* 

Each stage must map to a separate SageMaker Unified Studio project, providing full isolation between environments. The CLI substitutes variables like ${AWS_ACCOUNT_ID} and ${AWS_REGION} at deploy time based on the target environment.

Bundles

A bundle is an immutable, versioned archive of your application. The bundle command reads from a source stage (typically dev) and packages the application code, workflow definitions, and resolved configurations into a self-contained artifact. The deploy command then applies that artifact to one or more target stages (test or prod).

This stage-to-bundle-to-stage promotion model supports controlled rollout through quality gates:

# Package from dev 
aws-smus-cicd-cli bundle --manifest manifest.yaml 
 
# Deploy to test 
aws-smus-cicd-cli deploy --manifest app.tar.gz --targets test 
 
# Validate the test deployment 
aws-smus-cicd-cli test --manifest manifest.yaml --targets test 
 
# Promote the same bundle to prod 
aws-smus-cicd-cli deploy --manifest app.tar.gz --targets prod 

The same artifact is deployed at every stage without rebuilding, providing audit trails and reproducible deployments for regulated industries.

SageMaker Catalog integration

The CLI manages Amazon SageMaker Catalog resources as part of the deployment process. You can define catalog assets, glossaries, glossary terms, form types, asset types, and metadata forms, in your manifest. During deployment, the CLI searches for assets in the catalog, creates subscription requests for required data access, and waits for approval before proceeding. This automates the data governance workflow that teams previously handled manually.

CLI commands

The CI/CD CLI provides commands that cover the full deployment lifecycle:

Command Description
describe Validates the manifest, checks that target projects exist, and confirms the execution role has required permissions. Use –connect to validate against live AWS environments.
bundle Reads from a source stage and packages application code, workflow definitions, and configurations into an immutable, versioned archive.
deploy Applies bundle contents to one or more target stages. Provisions resources in dependency order.
test Runs post-deployment validation to confirm services are running and ready for workloads.
create Generates a starter manifest from an existing SageMaker Unified Studio project.
run Triggers Airflow workflow execution on MWAA or Airflow Serverless connections.
monitor Monitors workflow execution status in real time.
logs Fetches and streams workflow execution logs.
destroy Removes deployed resources and projects for cleanup or failure recovery.

Walkthrough: deploying a Quick Sight dashboard with AWS Glue ETL

In this section, we walk through deploying an analytics application that uses AWS Glue for ETL, Athena for queries, and Quick Sight for dashboards. This example is available in the GitHub repository.

Use case

An analytics team owns a Sales Analytics Dashboard built on AWS Glue ETL, Athena, and Quick Sight. They want to promote changes from a development environment to production with reproducible builds, automated validation, and a clear approval gate between stages, without writing custom deployment scripts or exposing data engineers to AWS provisioning details.

Solution overview

We use a sample application from the CI/CD CLI GitHub repository that includes AWS Glue ETL scripts, an Airflow workflow definition, a Quick Sight dashboard bundle, and integration tests. A single manifest.yaml describes the application and its dev and prod stages. The CLI handles the full lifecycle: bundle the app from dev, deploy it to test, run validation, and promote the same immutable artifact to prod.

Prerequisites

Before you begin, make sure you have the following:

Solution architecture

Each stage in the manifest maps to a dedicated SageMaker Unified Studio project (see the separation-of-concerns diagram in “How the CI/CD CLI works” earlier in this post). At deploy time, the CLI uploads ETL scripts and workflow definitions to the project’s S3 storage connection, provisions the Airflow workflow in MWAA Serverless, runs the workflow to create AWS Glue jobs and databases, and imports the Quick Sight dashboard. The same bundle artifact is applied to every downstream stage, ensuring dev, test, and prod stay in sync while remaining fully isolated.

Solution implementation

Step 1: Install the CLI

Install the CLI from PyPI:

pip install aws-smus-cicd-cli

Step 2: Create or customize a manifest

Clone the repository and start from the analytics example:

git clone https://github.com/aws/CICD-for-SageMakerUnifiedStudio.gitcd CICD-for-SageMakerUnifiedStudio/examples/analytic-workflow/dashboard-glue-quick

The example includes AWS Glue ETL scripts, an Airflow workflow definition, a Quick Sight dashboard bundle, and integration tests. Open manifest.yaml and update the project, domain, and deployment_configuration values under each stage so they match your own SageMaker Unified Studio projects and connection names.Alternatively, generate a manifest from an existing project: aws-smus-cicd-cli create --domain-id <your-domain-id> --dev-project-id <your-project-id>

Step 3: Validate your configuration

Run the describe command with --connect to verify your environment is ready. This connects to your AWS environment and validates that target projects exist, the execution role has the required permissions, and connections are reachable. Fix any issues before deploying.

aws-smus-cicd-cli describe --manifest manifest.yaml --connect

Step 4: Deploy

Run the deployment:

aws-smus-cicd-cli deploy --targets test --manifest manifest
During deployment, the CLI:
  1. Uploads ETL scripts and workflow definitions to S3 using the project’s storage connection.
  2. Creates the Airflow workflow in MWAA Serverless.
  3. Runs the workflow, which provisions AWS Glue jobs, creates databases, and runs ETL transformations.
  4. Imports the Quick Sight dashboard and refreshes datasets with the latest data.
  5. Processes any catalog asset subscriptions defined in the manifest.

Step 5: Validate

Run post-deployment validation to confirm services are running and ready for workloads:

aws-smus-cicd-cli test --manifest manifest.yaml --targets test

Step 6: Promote to production

Promote the same bundle artifact that was validated in the test stage to production. This guarantees the exact same artifact runs in prod:

# Promote the same bundle that was validated in test to prod

aws-smus-cicd-cli deploy --manifest app.tar.gz --targets prod

Integrating with GitHub Actions

The CLI works with existing CI/CD solutions. The GitHub repository includes reusable workflow templates that DevOps teams can adopt directly.The following is an example of a GitHub Actions workflow that implements a full bundle-based deployment pipeline:

name: Deploy Analytics Application 
on: 
  push: 
    branches: [main] 
 
jobs: 
  deploy-test: 
    runs-on: ubuntu-latest 
    steps: 
      - uses: actions/checkout@v4 
 
      - name: Install CLI 
        run: pip install aws-smus-cicd-cli 
 
      - name: Configure AWS credentials 
        uses: aws-actions/configure-aws-credentials@v4 
        with: 
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }} 
          aws-region: us-east-1 
 
      - name: Validate 
        run: aws-smus-cicd-cli describe --manifest manifest.yaml --connect 
 
      - name: Bundle 
        run: aws-smus-cicd-cli bundle --manifest manifest.yaml 
 
      - name: Deploy to test 
        run: aws-smus-cicd-cli deploy --targets test --manifest manifest.yaml 
 
      - name: Run tests 
        run: aws-smus-cicd-cli test --manifest manifest.yaml --targets test 
 
  deploy-prod: 
    needs: deploy-test 
    runs-on: ubuntu-latest 
    environment: production 
    steps: 
      - uses: actions/checkout@v4 
 
      - name: Install CLI 
        run: pip install aws-smus-cicd-cli 
 
      - name: Configure AWS credentials 
        uses: aws-actions/configure-aws-credentials@v4 
        with: 
          role-to-assume: ${{ secrets.AWS_PROD_ROLE_ARN }} 
          aws-region: us-west-2 
 
      - name: Deploy to production 
        run: aws-smus-cicd-cli deploy --targets prod --manifest manifest.yaml

The CLI also works with Jenkins, GitLab CI, and Azure DevOps. See the CI/CD integration guide for additional examples.

In the next section, we cover which AWS services and workload types the CLI supports.

Supported workloads

The CLI deploys applications that span the following AWS services through Airflow workflow definitions:

  • Analytics and BI: AWS Glue ETL jobs and crawlers, Amazon Athena queries, Amazon Quick Sight dashboards, Amazon EMR jobs, Amazon Redshift queries.
  • Machine learning: SageMaker training jobs, ML model endpoints, SageMaker AI Pipelines.
  • Code and workflows: Jupyter notebooks, Python scripts, Airflow DAGs (MWAA and MWAA Serverless).
  • Data and storage: S3 data files, Git repositories, SageMaker Catalog resources (glossaries, glossary terms, form types, asset types, assets, data products, metadata forms).

The examples directory includes working applications for each of these patterns, with manifests, workflow definitions, and integration tests.

Failure recovery

If a deployment fails, the CLI stops at the point of failure and reports the error with a detailed stack trace. To recover:

  1. Run aws-smus-cicd-cli describe --connect to check which resources exist and which permissions are missing.
  2. Fix the issue and rerun aws-smus-cicd-cli deploy.
  3. For bundle-based deployments, redeploy a previous bundle version.
  4. Use aws-smus-cicd-cli destroy --targets <target> --force to clean up a failed deployment.

For detailed rollback procedures, see the Rollback Guide.

Conclusion

In this post, you learned how the Amazon SageMaker Unified Studio CI/CD CLI gives data and DevOps teams a clean separation of concerns: data teams describe their application once in a YAML manifest, and DevOps teams deploy it with a single command through their existing CI/CD pipelines. You saw how stages map to isolated SageMaker Unified Studio projects (optionally spanning AWS accounts and Regions), how bundles provide immutable, reproducible promotion through test and production, and how the CLI integrates with GitHub Actions, Jenkins, GitLab CI, and Azure DevOps. You also walked through deploying a Glue-and-Quick-Sight analytics application from dev through to prod.

Get started

The CI/CD CLI is available at no additional cost in all AWS Regions where Amazon SageMaker Unified Studio is available. You pay only for the underlying AWS resources provisioned during deployment.

Use the following steps to try it out:

  1. Install the CLI:
    pip install aws-smus-cicd-cli
  2. Browse the example applications for analytics and ML patterns.
  3. Follow the CI/CD CLI documentation to deploy your first application in 10 minutes.
  4. Review the Admin Guide for infrastructure setup.

For feedback and bug reports, open an issue on the GitHub repository.


About the authors

Ramesh H Singh

Ramesh H Singh

Ramesh H Singh is a Senior Product Manager Technical (External Services) at AWS in Seattle, Washington, currently with the Amazon SageMaker team. He is passionate about building high-performance ML/AI and analytics products that help enterprise customers achieve their critical goals using cutting-edge technology.

Vasudevan Venkataramanan

Vasudevan Venkataramanan

Vasudevan Venkataramanan is a Senior Software Engineer on the Amazon SageMaker Unified Studio team. He is responsible for technical direction of scheduling and orchestration within SageMaker Unified Studio. Outside of his professional work, he enjoys spending time with his kid, and playing pickleball and cricket.

Amir Bar Or

Amir Bar Or

Amir Bar Or is a Senior Software Engineer on the Amazon SageMaker Unified Studio team. He is responsible for technical direction of scheduling and orchestration within SageMaker Unified Studio. Outside of his professional work, he enjoys spending time with his kid, and playing pickleball and cricket.

Nikita Arbuzov

Nikita Arbuzov

Nikita is Software Engineer on the Amazon SageMaker Unified Studio team. He is responsible for building support for CI/CD features within SageMaker Unified Studio.

Saurabh Bhutyani

Saurabh Bhutyani

Saurabh Bhutyani is a Principal Analytics Specialist Solutions Architect at AWS. He is passionate about new technologies. He joined AWS in 2019 and works with customers to provide architectural guidance for running generative AI use cases, scalable analytics solutions and data mesh architectures using AWS services like Amazon Bedrock, Amazon SageMaker Unified Studio, Amazon EMR, Amazon Athena, AWS Glue, AWS Lake Formation, and Amazon DataZone.