Skip to main content

Guidance for Collaborative, Unified Data and AI Development on AWS

Streamline development of data and AI applications for data engineers, analysts, scientists, and app developers

Overview

This Guidance demonstrates how to use Amazon SageMaker Unified Studio to create a unified development experience for building, deploying, executing, and monitoring end-to-end workflows across AWS data, analytics, and AI/ML services. By showcasing the capabilities of SageMaker Unified Studio, the Guidance helps you streamline your data operations, from ingestion to product deployment. It also illustrates how this integrated approach can enhance efficiency, reduce complexity, and provide comprehensive control over diverse AWS services, ultimately simplifying the management of a complex data workflow.

How it works

Overview

This architecture diagram shows how Amazon SageMaker provides a unified, collaborative experience for ML and data engineers, data stewards, and generative AI developers to accelerate data applications, from exploration to production.

Diagram illustrating the AWS collaborative unified data and AI architecture, showing an authentication provider, Amazon SageMaker projects, portal interface, backend resources, and roles including data steward, data engineer, ML engineer, and data analyst. Backend resources highlight development, data infrastructure, and lakehouse capabilities.

Generative AI Lakehouse

This architecture diagram shows how Amazon SageMaker Unified Studio enables a collaborative data engineering and analytics experience for sales forecasting using a Lakehouse architecture, web-based studio with generative AI, and orchestration tools in a unified portal.

Architecture diagram showing a collaborative unified Data & AI lakehouse solution on AWS. It illustrates data engineers and analysts accessing AWS services including IAM Identity Center, Amazon SageMaker Unified Studio, Amazon SageMaker Lakehouse, Amazon Q Developer, Amazon Redshift Serverless, Amazon S3, AWS Glue Data Catalog, Analytics Services such as AWS Glue ETL, Amazon Athena, Amazon EMR, Amazon Redshift, and Amazon MWAA, for sales forecasting project and data processing workflows.

Collaborative model deployment

This architecture diagram shows how Amazon SageMaker empowers ML engineers to collaboratively develop, evaluate, and deploy sales forecasting models using Amazon SageMaker, SageMaker JumpStart, and SageMaker Workflows within a unified portal.

Architecture diagram illustrating the AWS Collaborative Unified Data and AI development workflow. The diagram shows ML engineers accessing AWS IAM Identity Center, working through Amazon SageMaker Unified Studio, integrating database assets and coding capabilities via Amazon Q Developer, leveraging Amazon SageMaker Lakehouse (with Redshift Serverless, S3 storage, and AWS Glue Data Catalog), and utilizing ML capabilities in Amazon SageMaker for a sales forecasting project.

Deploy with confidence

Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.

Go to sample code

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

SageMaker Unified Studio integrates team collaboration, Git, analytics services, and AI/ML services to provide a unified data development experience. This creates a centralized operational control plane for collaborating on and executing end-to-end data ingestion, preparation, and deployment of data products. By enabling collaboration and offering a unified developer experience, SageMaker Unified Studio helps you design for operations, allowing full automation of data service integration and deployment.

Read the Operational Excellence whitepaper 

SageMaker Unified Studio delivers an SSO experience through deployed web domains that can be federated to IdPs such as IAM Identity Center. You can implement access control policies for users and groups, so that projects, data, and models are accessible with least-privileged permissions. By using SageMaker Unified Studio domains with federated IdP, you can create logical separation of control, defining permission guardrails for your organization. This enables lifecycle-based access management through continuous monitoring and fine-tuning of access controls.

Read the Security whitepaper 

SageMaker Unified Studio unifies data ingestion, storage, and analytics services, including Amazon S3 and Amazon Redshift to establish a reliable control plane for your data operations. You can leverage these underlying services and tools to create fault-tolerance at the service level through a unified web experience. The SageMaker Unified Studio interface simplifies the orchestration of data and analytics services, allowing easier monitoring and control of data workloads. This reduces the complexity of coordinating and governing individual services, making it more straightforward to detect failures and recover within a single web interface.

Read the Reliability whitepaper 

Amazon Q Developer uses generative AI to provide code recommendations, reducing the complexity and effort of development. SageMaker offers access to pre-trained models and simplifies the process of training, validating, and deploying models for your specific use cases. By using these tools, you can accelerate development and implement code recommendations and model deployment without having to manage complex underlying AI/ML technologies.

Read the Performance Efficiency whitepaper 

SageMaker Unified Studio assists in selecting the right resources for your data workloads by unifying the end-to-end development process. It enables quick deployment and decommissioning of data and analytics services, helping control the costs associated with data product development. By reducing the complexity of development and deployment, SageMaker Unified Studio helps you manage services more effectively. This leads to reduced data transfer costs, improved workload performance analysis, and dynamic resource allocation.

Read the Cost Optimization whitepaper 

The managed services underlying SageMaker Unified Studio offer on-demand scaling in addition to data access and lifecycle control. This easier access and control of your data facilitates continuous monitoring of usage, helping reduce the impact of data operations and create more efficient workloads. As a result, you can better predict and control usage, scaling demand without overprovisioning resources for future needs.

Read the Sustainability whitepaper 

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.