Streamline development of data and AI applications for data engineers, analysts, scientists, and app developers
This Guidance demonstrates how to use Amazon SageMaker Unified Studio to create a unified development experience for building, deploying, executing, and monitoring end-to-end workflows across AWS data, analytics, and AI/ML services. By showcasing the capabilities of SageMaker Unified Studio, the Guidance helps you streamline your data operations, from ingestion to product deployment. It also illustrates how this integrated approach can enhance efficiency, reduce complexity, and provide comprehensive control over diverse AWS services, ultimately simplifying the management of a complex data workflow.
Please note: [Disclaimer]
Architecture Diagram
-
Overview
-
Generative AI Lakehouse
-
Collaborative model deployment
-
Overview
-
This architecture diagram shows how Amazon SageMaker Unified Studio provides a unified, collaborative experience for ML and data engineers, data stewards, and generative AI developers to accelerate data applications, from exploration to production.
Step 1
Amazon Sagemaker allows you to configure domains. Each domain provides a single and isolated portal for data teams to collaborate.
Step 2
Sagemaker offers a single sign-on (SSO) experience, allowing individual users or groups to access a particular domain. You can either use an identity provider (IdP) in AWS or connect your existing IdP using SAML.
Step 3
Once you authenticate, Sagemaker redirects to your assigned domain portal. This portal offers a common space for you to collaborate on projects. In this Guidance, you will use a sales forecasting project where a data engineer, ML engineer, and data analyst can work together.
Step 4
In Sagemaker, you can create projects backed by GitHub repositories, enabling user collaboration. The platform provides version control, a browser-based integrated development environment (IDE), and a low-code/no-code experience for solution development, with access to native generative AI services.
Step 5
Data engineers transform datasets, such as for sales forecasting, before model development. The backend resources offer tools and infrastructure for data warehouses, engineering, pipelines, governance, ML, and generative AI. The data catalog, lineage, and pub/sub empower user collaboration on trusted, governed data.
Step 6
The sales dataset is stored in persistent storage. In order to allow access to data, native services are used to implement a Lakehouse architecture for the project to use.Step 7
A studio (web-based IDE) experience is provided to allow both data and ML engineers to perform data transformations in addition to model training and validation operations.
Step 8
Sagemaker unifies data analytics tools, enabling you to build, deploy, and monitor data applications through a cohesive experience. The suite includes a Query Editor, Amazon SageMaker JumpStart, SageMaker Endpoints, SageMaker Workflows, and JupyterLab IDE. -
Generative AI Lakehouse
-
This architecture diagram shows how Amazon SageMaker Unified Studio enables a collaborative data engineering and analytics experience for sales forecasting using a Lakehouse architecture, web-based studio with generative AI, and orchestration tools in a unified portal.
Step 1
AWS IAM Identity Center manages user access and SSO to Amazon SageMaker Unified Studio for data engineers.Step 2
SageMaker Unified Studio allows data engineers and data analysts to collaborate on the sales forecasting project.
Step 3
Store the sales dataset for the forecasting project in an Amazon Simple Storage Service (Amazon S3)-backed data Lakehouse architecture. Use AWS Glue for data cataloging and Amazon Redshift Serverless for fast data retrieval. Govern the data using AWS Lake Formation permissions and access it through the Iceberg API, enabling seamless integration between the Amazon S3 and Amazon Redshift Serverless tiers.Step 4
SageMaker Unified Studio provides a web-based interface to the data engineer, allowing them to perform the necessary transformation to the sales dataset without needing to leave Sagemaker or swap consoles. You can use Amazon Q Developer to provide in-place AI-generated coding recommendations.
Step 5
The studio IDE interface in Sagemaker allows you to leverage Amazon Athena and Amazon Redshift for data exploration and heavy data transformations, respectively. You can store the sales dataset and intermediate datasets resulting from data transformations in Amazon S3.
Step 6
The Workflows tool automates the end-to-end process of data ingestion using Amazon Managed Workflows for Apache Airflow (Amazon MWAA) for orchestration and AWS Glue for task execution. The Query Editor in Sagemaker provides a SQL notebook-style interface to write, run, and save queries against data sources in Amazon Redshift and the AWS Glue Data Catalog, allowing users to upload and view sample data.
-
Collaborative model deployment
-
This architecture diagram shows how Amazon SageMaker Unified Studio empowers ML engineers to collaboratively develop, evaluate, and deploy sales forecasting models using Amazon SageMaker, SageMaker JumpStart, and SageMaker Workflows within a unified portal.
Step 1
IAM Identity Center manages user access and SSO to SageMaker Unified Studio for ML engineers.Step 2
SageMaker Unified Studio allows your ML engineer to collaborate on the sales forecasting project.
Step 3
SageMaker Unified Studio provides a web-based interface to your ML engineer to train and validate the model using the dataset previously prepared and curated by the data engineer. You can use Amazon Q Developer to provide in-place AI-generated coding recommendations.
Step 4
Use SageMaker compute to train and validate the model with the datasets prepared by your data engineers.
Step 5
Use the SageMaker JumpStart tool to evaluate pre-trained models in SageMaker and the SageMaker Endpoints tool to deploy the final sales forecasting model for online predictions in SageMaker. Use the SageMaker Workflows tool to automate the end-to-end model training and deployment.
Step 6
SageMaker stores the model artifacts in Amazon S3.
Get Started
Deploy this Guidance
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
SageMaker Unified Studio integrates team collaboration, Git, analytics services, and AI/ML services to provide a unified data development experience. This creates a centralized operational control plane for collaborating on and executing end-to-end data ingestion, preparation, and deployment of data products. By enabling collaboration and offering a unified developer experience, SageMaker Unified Studio helps you design for operations, allowing full automation of data service integration and deployment.
-
Security
SageMaker Unified Studio delivers an SSO experience through deployed web domains that can be federated to IdPs such as IAM Identity Center. You can implement access control policies for users and groups, so that projects, data, and models are accessible with least-privileged permissions. By using SageMaker Unified Studio domains with federated IdP, you can create logical separation of control, defining permission guardrails for your organization. This enables lifecycle-based access management through continuous monitoring and fine-tuning of access controls.
-
Reliability
SageMaker Unified Studio unifies data ingestion, storage, and analytics services, including Amazon S3 and Amazon Redshift to establish a reliable control plane for your data operations. You can leverage these underlying services and tools to create fault-tolerance at the service level through a unified web experience. The SageMaker Unified Studio interface simplifies the orchestration of data and analytics services, allowing easier monitoring and control of data workloads. This reduces the complexity of coordinating and governing individual services, making it more straightforward to detect failures and recover within a single web interface.
-
Performance Efficiency
Amazon Q Developer uses generative AI to provide code recommendations, reducing the complexity and effort of development. SageMaker offers access to pre-trained models and simplifies the process of training, validating, and deploying models for your specific use cases. By using these tools, you can accelerate development and implement code recommendations and model deployment without having to manage complex underlying AI/ML technologies.
-
Cost Optimization
SageMaker Unified Studio assists in selecting the right resources for your data workloads by unifying the end-to-end development process. It enables quick deployment and decommissioning of data and analytics services, helping control the costs associated with data product development. By reducing the complexity of development and deployment, SageMaker Unified Studio helps you manage services more effectively. This leads to reduced data transfer costs, improved workload performance analysis, and dynamic resource allocation.
-
Sustainability
The managed services underlying SageMaker Unified Studio offer on-demand scaling in addition to data access and lifecycle control. This easier access and control of your data facilitates continuous monitoring of usage, helping reduce the impact of data operations and create more efficient workloads. As a result, you can better predict and control usage, scaling demand without overprovisioning resources for future needs.
Related Content
[Title]
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.