AWS Big Data Blog
Introducing enhanced AI assistance in Amazon SageMaker Unified Studio: Agentic chat, Amazon Q Developer CLI, and MCP integration
Amazon Q Developer provides generative AI assistance within Amazon SageMaker Unified Studio for data discovery, data processing, SQL analytics, and machine learning workflows. Today, we are announcing improvements to the Amazon Q Developer chat experience in SageMaker Unified Studio JupyterLab integrated development environment (IDE) and adding Amazon Q Developer in the command line in JupyterLab and Code Editor IDEs. By integrating with Model Context Protocol (MCP) servers, Amazon Q Developer is aware of your SageMaker Unified Studio project resources, including data, compute, and code, and provides personalized, relevant responses for data engineering and machine learning development. You can use this improved AI assistance to setup your development environment more quickly, and for tasks like code refactoring, file modification, and troubleshooting while maintaining transparency into how the AI assistant is acting on your behalf.
Solution implementation
In this post, we will walk through how you can use the improved Amazon Q Developer chat and the new built-in Amazon Q Developer CLI in SageMaker Unified Studio for coding ETL tasks, to fix code errors, and generate ML development workflows. Both interfaces use MCP to read files, run commands, and interact with AWS services directly from the IDE. You can also configure additional MCP servers to extend Amazon Q Developer’s capabilities with custom tools and integrations specific to your workflow.
Prerequisites
Before starting this tutorial, you must have the following prerequisites:
- Access to a SageMaker Unified Studio domain. If you don’t have a Unified Studio domain, you can create one using the quick setup or manual setup option.
- Access to or can create a SageMaker Unified Studio project with the All capabilities project profile enabled.
- Access to or can create a JupyterLab or Code Editor compute space. We will walk through a JupyterLab IDE example. There is no minimum instance type requirement to use the new features. In this post, we use an
ml.t3.medium
instance. At launch, SageMaker Distribution images 2.9 (contains Amazon Q Developer chat and Amazon Q Developer CLI) or 3.4 (contains Amazon Q Developer CLI) are required.
Uploading the dataset to an Amazon S3 bucket
- Download the Diabetes 130-US hospitals dataset. This dataset contains 10 years (1999–2008) of clinical care data from 130 US hospitals and integrated delivery networks.
- On the Data section in the middle of your project page, choose + on the top. This opens Add data on the right.
- On Add data, choose Create table.
- Select Choose file or drag and drop the
diabetic_data
CSV file. - Select S3/external table and complete the information in the form.
- Select Next to upload the dataset.
Amazon Q Developer chat
Amazon Q Developer chat in SageMaker Unified Studio is an agentic AI assistant that automatically understands your project, including data, compute resources, and code to provide highly relevant suggestions and insights. It helps you answer questions about your project, understand complex datasets, write code, and create notebooks, making it a powerful coding companion for creating ETL workflows, building ML models, or developing generative AI applications. We will walk through user personas, data engineer and ML engineer, to show how to use the Amazon Q Developer chat to do exploratory data analysis, troubleshoot code, and perform predictive analysis. Note: Amazon Q Developer code security scanning will auto-scan the code as it is being written in the IDE and provide recommendations for remediation and in some cases a code fix as well. This helps you proactively identify and remove security vulnerabilities in your codebase, both in existing codebase and in new code as you write it in the IDE.
To launch Amazon Q Developer chat:
- Navigate to your project. Access the JupyterLab IDE. At the time of launch, Amazon Q Developer chat is only available in the JupyterLab IDE.
- Choose the icon on the left for Amazon Q Developer chat. If this is the first time opening, a message displays for you to acknowledge the AWS policies for responsible AI.
- Enter the questions to interact with Amazon Q Developer chat. Enter over the Ask a question… line.
Configure additional MCP servers
You can add additional MCP servers such as the Amazon Datazone MCP server or the AWS Data Processing MCP Server for use in Amazon Q Developer chat and the Amazon Q Developer CLI. In the following steps, we add the AWS Data Processing MCP Server, an open source tool that uses MCP to simplify analytics environment setup. The AWS Data Processing MCP Server includes access to AWS Glue job statuses, Amazon Athena query results, Amazon EMR cluster metrics, and AWS Glue Data Catalog metadata. For more information on configuring MCP servers, see MCP configuration for Q Developer in the IDE.
The following are the steps to configure additional MCP servers:
- Navigate to Amazon Q Developer chat and select the Configure MCP servers tools icon in the upper right. You also have the option edit the configuration file located at
/home/sagemaker-user/.aws/amazonq/agents/default.json
to add an MCP sever in Amazon Q Developer chat. You can also navigate to/home/sagemaker-user/.aws/amazonq/mcp.json
in the terminal and edit the configuration file to add an MCP server in Amazon Q Developer CLI.
- Select the + symbol to Add new MCP server.
- Add the following information in the form:
- Select the scope: Global
- Name: Enter
awsdp-mcp
- Transport: Select
stdio
- Command: Enter
uvx
- Arguments-optional: Enter
awslabs.aws-dataprocessing-mcp-server@latest
- Choose Save.
Data engineer
As a data engineer, you might build ETL jobs and data pipelines. Amazon Q Developer chat helps reduce setup time and improves workflow efficiency by refactoring code, implementing best practices, and troubleshooting errors. Amazon Q Developer uses AI to provide code recommendations, and this is non-deterministic. The results you get might be different from the ones shown in the following examples. Example prompt:
Run the following steps, after the solution is created.
- Go to the notebook.
- Run the created notebook and review each section:
- Data loading
- Descriptive analysis
- Correlation matrix
- Data preprocessing such as handling missing values
- Analyze importance of features
- Review the
README.md
file. - You can make changes on the created files.
- You can prompt the Amazon Q Developer chat to make additional changes for you.
Fix errors without specifying the error
You can give instructions in a conversational way to Amazon Q Developer chat. Without the need to specify the error, Amazon Q Developer chat will access your notebook and fix the error.
- Open your notebook.
- Prompt
The notebook isn’t running, can you fix it?
Amazon Q Developer chat will identify the error from the notebook. - Review the issue and the solution. Run the notebook again.
ML engineer
As an ML engineer, you might analyze complex datasets and run ML experiments. You can ask Amazon Q Developer chat to take on an ML engineer role and perform a predictive ML model on the dataset. Also, you can ask to take the output from the data engineer into account. Example prompt:
Run the following steps, after the solution is created:
- Run the created notebook and review each section:
- Note that the notebook is running successfully.
- Amazon Q chat incorporated feature engineering section based on data engineer’s output.
- Four ML models (Logistic Regression, Random Forest, Gradient Boosting, and XGBoost) were identified for diabetes readmission prediction.
- Models were evaluated using a comprehensive metrics suite including accuracy, precision, recall, F1 score, and ROC AUC to help ensure balanced performance.
- Feature engineering produced critical predictors such as previous inpatient visits and medication changes, while hyperparameter tuning optimized model performance.
- The final implementation balances predictive power with clinical interpretability, enabling effective identification of high-risk patients.
Amazon Q Developer CLI
The Amazon Q Developer CLI also understands your code, data, and compute resources, but is optimized for users who prefer working in the terminal. It helps you execute and automate data processing, model training, and generative AI tasks through natural language prompts.To launch the Amazon Q Developer CLI:
- On the top menu of your SageMaker Unified Studio project page, choose Build, and under IDE & APPLICATIONS, choose JupyterLab.
- Wait for the space to be ready.
- From the Launcher tab, open a new terminal. Or navigate to File > New > Terminal.
- Enter
q chat
At launch, Anthropic’s Claude Sonnet 4 in Amazon Bedrock is the default large language model (LLM). You can choose other LLMs, depending on your AWS Region. To view the available models or change the models enter /model
. MCP tools are executable functions that MCP servers expose to the Amazon Q Developer CLI. They enable Amazon Q Developer to perform actions, process data, and interact with external systems on your behalf. To view the available tools, enter /tools
.
Example prompt:
Clean up
SageMaker Unified Studio by default shuts down idle resources such as JupyterLab and Code Editor spaces after 1 hour. However, you need to delete the Amazon Simple Storage Service (Amazon S3) bucket to stop incurring additional charges. You can delete any real-time endpoints you created using the SageMaker console. For instructions, see Delete Endpoints and Resources.
Conclusion
The improved AI assistance available in JupyterLab and Code Editor IDEs in SageMaker Unified Studio helps streamline data engineering and machine learning workflows by providing answers relevant to your project files, notebooks, data, and compute. Whether you’re a data engineer building ETL pipelines, a data scientist conducting exploratory analysis, or an ML engineer developing predictive models, these features now understand what you’re working on and help you do it more efficiently. This is just the start of our agentic journey in SageMaker Unified Studio. To learn more, review the SageMaker Unified Studio User Guide. We encourage you to explore the MCP capabilities and the AWS MCP Servers repository on GitHub.
About the authors
Lauren Mullennex is a Senior GenAI/ML Specialist Solutions Architect at AWS. She has over a decade of experience in ML, DevOps, and infrastructure. She is a published author of a book on computer vision. Outside of work, you can find her traveling and hiking with her two dogs.
Siddharth Gupta is heading Generative AI within SageMaker’s Unified Experiences. His focus is on driving agentic experiences, where AI systems act autonomously on behalf of users to accomplish complex tasks. Previously, he led edge machine learning solutions at AWS. This cutting-edge work aims to revolutionize how developers and data scientists interact with AI, creating more intuitive data integrations and powerful tools for building and deploying machine learning models. An alumnus of the University of Illinois at Urbana-Champaign, he brings extensive experience from his roles at Yahoo, Glassdoor, and Twitch. You can reach out to him on LinkedIn.
Ishneet Kaur is a Software Development Manager on the Amazon SageMaker Unified Studio team. She leads the engineering team to design and build GenAI capabilities in SageMaker Unified Studio
Mohan Gandhi is a Senior Software Engineer at AWS. He has been with AWS for the last 10 years and has worked on various AWS services like Amazon EMR, Amazon EFA, and Amazon RDS. Currently, he is focused on improving the SageMaker inference experience. In his spare time, he enjoys hiking and marathons.
Mukul Prasad is a Senior Applied Science Manager in the AWS Agentic AI organization. He leads the Data Processing Agents Science team developing DevOps agents to simplify and optimize the customer journey in using AWS Big Data processing services including Amazon EMR, AWS Glue, and Amazon SageMaker Unified Studio. Outside of work, Mukul enjoys food, travel, photography, and Cricket.
Murali Narayanaswamy is a Principal Machine Learning Scientist in the Agentic AI organization in AWS working on products including Amazon Bedrock, Amazon SageMaker Unified Studio, Amazon Redshift and Amazon RDS. His research interests lie at the intersection of AI, optimization, learning and inference particularly using them to understand, model and combat noise and uncertainty in real world applications and Reinforcement Learning in practice and at scale. Broadly, he works on using ideas from online algorithms, optimization under uncertainty, control theory, game theory, artificial intelligence, graphical models and estimation theory to solve important problems at Amazon scale.
Necibe Ahat is a Senior AI/ML Specialist Solutions Architect at AWS, working with Healthcare and Life Sciences customers. Necibe helps customers to advance their generative AI and machine learning journey. She has a background in computer science with 15 years of industry experience helping customers ideate, design, build and deploy solutions at scale. She is a passionate inclusion and diversity advocate.
Vipin Mohan is a Principal Product Manager at Amazon Web Services, where he leads generative AI product strategy. He specializes in building AI/ML products, container platforms, and search technologies that serve thousands of customers. Outside of work, he mentors aspiring product managers, enjoys reading about financial investing and entrepreneurship, and loves exploring the world through the eyes of his two kids.