AWS Big Data Blog

Introducing enhanced AI assistance in Amazon SageMaker Unified Studio: Agentic chat, Amazon Q Developer CLI, and MCP integration

Amazon Q Developer provides generative AI assistance within Amazon SageMaker Unified Studio for data discovery, data processing, SQL analytics, and machine learning workflows. Today, we are announcing improvements to the Amazon Q Developer chat experience in SageMaker Unified Studio JupyterLab integrated development environment (IDE) and adding Amazon Q Developer in the command line in JupyterLab and Code Editor IDEs. By integrating with Model Context Protocol (MCP) servers, Amazon Q Developer is aware of your SageMaker Unified Studio project resources, including data, compute, and code, and provides personalized, relevant responses for data engineering and machine learning development. You can use this improved AI assistance to setup your development environment more quickly, and for tasks like code refactoring, file modification, and troubleshooting while maintaining transparency into how the AI assistant is acting on your behalf.

Solution implementation

In this post, we will walk through how you can use the improved Amazon Q Developer chat and the new built-in Amazon Q Developer CLI in SageMaker Unified Studio for coding ETL tasks, to fix code errors, and generate ML development workflows. Both interfaces use MCP to read files, run commands, and interact with AWS services directly from the IDE. You can also configure additional MCP servers to extend Amazon Q Developer’s capabilities with custom tools and integrations specific to your workflow.

Prerequisites

Before starting this tutorial, you must have the following prerequisites:

  • Access to a SageMaker Unified Studio domain. If you don’t have a Unified Studio domain, you can create one using the quick setup or manual setup option.
  • Access to or can create a SageMaker Unified Studio project with the All capabilities project profile enabled.
  • Access to or can create a JupyterLab or Code Editor compute space. We will walk through a JupyterLab IDE example. There is no minimum instance type requirement to use the new features. In this post, we use an ml.t3.medium instance. At launch, SageMaker Distribution images 2.9 (contains Amazon Q Developer chat and Amazon Q Developer CLI) or 3.4 (contains Amazon Q Developer CLI) are required.

Uploading the dataset to an Amazon S3 bucket

  1. Download the Diabetes 130-US hospitals dataset. This dataset contains 10 years (1999–2008) of clinical care data from 130 US hospitals and integrated delivery networks.
  2. On the Data section in the middle of your project page, choose + on the top. This opens Add data on the right.
  3. On Add data, choose Create table.
  4. Select Choose file or drag and drop the diabetic_data CSV file.
  5. Select S3/external table and complete the information in the form.
  6. Select Next to upload the dataset.

Amazon Q Developer chat

Amazon Q Developer chat in SageMaker Unified Studio is an agentic AI assistant that automatically understands your project, including data, compute resources, and code to provide highly relevant suggestions and insights. It helps you answer questions about your project, understand complex datasets, write code, and create notebooks, making it a powerful coding companion for creating ETL workflows, building ML models, or developing generative AI applications. We will walk through user personas, data engineer and ML engineer, to show how to use the Amazon Q Developer chat to do exploratory data analysis, troubleshoot code, and perform predictive analysis. Note: Amazon Q Developer code security scanning will auto-scan the code as it is being written in the IDE and provide recommendations for remediation and in some cases a code fix as well. This helps you proactively identify and remove security vulnerabilities in your codebase, both in existing codebase and in new code as you write it in the IDE.

To launch Amazon Q Developer chat:

  1. Navigate to your project. Access the JupyterLab IDE. At the time of launch, Amazon Q Developer chat is only available in the JupyterLab IDE.
  2. Choose the icon on the left for Amazon Q Developer chat. If this is the first time opening, a message displays for you to acknowledge the AWS policies for responsible AI.
  3. Enter the questions to interact with Amazon Q Developer chat. Enter over the Ask a question… line.

width="1140"

Configure additional MCP servers

You can add additional MCP servers such as the Amazon Datazone MCP server or the AWS Data Processing MCP Server for use in Amazon Q Developer chat and the Amazon Q Developer CLI. In the following steps, we add the AWS Data Processing MCP Server, an open source tool that uses MCP to simplify analytics environment setup. The AWS Data Processing MCP Server includes access to AWS Glue job statuses, Amazon Athena query results, Amazon EMR cluster metrics, and AWS Glue Data Catalog metadata. For more information on configuring MCP servers, see MCP configuration for Q Developer in the IDE.

The following are the steps to configure additional MCP servers:

  1. Navigate to Amazon Q Developer chat and select the Configure MCP servers tools icon in the upper right. You also have the option edit the configuration file located at /home/sagemaker-user/.aws/amazonq/agents/default.json to add an MCP sever in Amazon Q Developer chat. You can also navigate to /home/sagemaker-user/.aws/amazonq/mcp.json in the terminal and edit the configuration file to add an MCP server in Amazon Q Developer CLI.
    UI for configuring additional MCP server in Amazon Q Developer chat within SageMaker Studio
  2. Select the + symbol to Add new MCP server.
  3. Add the following information in the form:
  4. Select the scope: Global
  5. Name: Enter awsdp-mcp
  6. Transport: Select stdio
  7. Command: Enteruvx
  8. Arguments-optional: Enter awslabs.aws-dataprocessing-mcp-server@latest
    Configuration panel for Data Processing MCP server in Amazon Q Developer chat
  9. Choose Save.

Data engineer

As a data engineer, you might build ETL jobs and data pipelines. Amazon Q Developer chat helps reduce setup time and improves workflow efficiency by refactoring code, implementing best practices, and troubleshooting errors. Amazon Q Developer uses AI to provide code recommendations, and this is non-deterministic. The results you get might be different from the ones shown in the following examples. Example prompt:

You are a data engineer. Your responsibility is to perform descriptive and exploratory data analysis.
* Use the diabetic_data dataset in SageMaker Lakehouse.
* Find list of connections and note down their names
* Create a notebook. Use getting_started.ipynb for best practices and as an example notebook.
* Make sure to use correct connection names in cell magic commands
* Make sure to handle missing values, perform descriptive analysis, and feature analysis.
* Create a comprehensive README.md file.
* Create a new working directory under the /src directory.

Run the following steps, after the solution is created.

  1. Go to the notebook.
  2. Run the created notebook and review each section:
    • Data loading
    • Descriptive analysis
    • Correlation matrix
    • Data preprocessing such as handling missing values
    • Analyze importance of features
  3. Review the README.md file.
  4. You can make changes on the created files.
  5. You can prompt the Amazon Q Developer chat to make additional changes for you.

Data engineer's guided conversation with Amazon Q for exploratory data analysis with dataset insights
Comprehensive EDA notebook featuring Amazon Q generated code blocks, statistical analysis, and interactive visualizations

Fix errors without specifying the error

You can give instructions in a conversational way to Amazon Q Developer chat. Without the need to specify the error, Amazon Q Developer chat will access your notebook and fix the error.

  1. Open your notebook.
  2. Prompt The notebook isn’t running, can you fix it? Amazon Q Developer chat will identify the error from the notebook.
  3. Review the issue and the solution. Run the notebook again.

 Amazon Q Developer chat debugging a notebook error with solution

ML engineer

As an ML engineer, you might analyze complex datasets and run ML experiments. You can ask Amazon Q Developer chat to take on an ML engineer role and perform a predictive ML model on the dataset. Also, you can ask to take the output from the data engineer into account. Example prompt:

You are a machine learning engineer. Your responsibility is to perform predictive machine learning model on the data. The data engineer performed exploratory analysis. Use the output from the data engineer in your notebook. 
- Create a notebook to build a diabetes prediction model using Amazon SageMaker.
- Make sure to have model evaluation.
- Explain your choice for features and model selection.
- Create a comprehensive README.md file
- Do this in the working directory you created

Run the following steps, after the solution is created:

  1. Run the created notebook and review each section:
    • Note that the notebook is running successfully.
    • Amazon Q chat incorporated feature engineering section based on data engineer’s output.
  2. Four ML models (Logistic Regression, Random Forest, Gradient Boosting, and XGBoost) were identified for diabetes readmission prediction.
  3. Models were evaluated using a comprehensive metrics suite including accuracy, precision, recall, F1 score, and ROC AUC to help ensure balanced performance.
  4. Feature engineering produced critical predictors such as previous inpatient visits and medication changes, while hyperparameter tuning optimized model performance.
  5. The final implementation balances predictive power with clinical interpretability, enabling effective identification of high-risk patients.

Amazon Q chat interface showing ML model creation process
 Interactive Amazon Q session building comprehensive ML notebook with code, visualizations, and markdown explanations

Amazon Q Developer CLI

The Amazon Q Developer CLI also understands your code, data, and compute resources, but is optimized for users who prefer working in the terminal. It helps you execute and automate data processing, model training, and generative AI tasks through natural language prompts.To launch the Amazon Q Developer CLI:

  1. On the top menu of your SageMaker Unified Studio project page, choose Build, and under IDE & APPLICATIONS, choose JupyterLab.
  2. Wait for the space to be ready.
  3. From the Launcher tab, open a new terminal. Or navigate to File > New > Terminal.
  4. Enter q chat

Terminal window launching Amazon Q Developer CLI in SageMaker Studio

At launch, Anthropic’s Claude Sonnet 4 in Amazon Bedrock is the default large language model (LLM). You can choose other LLMs, depending on your AWS Region. To view the available models or change the models enter /model. MCP tools are executable functions that MCP servers expose to the Amazon Q Developer CLI. They enable Amazon Q Developer to perform actions, process data, and interact with external systems on your behalf. To view the available tools, enter /tools.

Example prompt:

Explore the datasets available in the project’s data catalog and do exploratory analysis.

Terminal window showing Amazon Q Developer CLI commands and responses

Clean up

SageMaker Unified Studio by default shuts down idle resources such as JupyterLab and Code Editor spaces after 1 hour. However, you need to delete the Amazon Simple Storage Service (Amazon S3) bucket to stop incurring additional charges. You can delete any real-time endpoints you created using the SageMaker console. For instructions, see Delete Endpoints and Resources.

Conclusion

The improved AI assistance available in JupyterLab and Code Editor IDEs in SageMaker Unified Studio helps streamline data engineering and machine learning workflows by providing answers relevant to your project files, notebooks, data, and compute. Whether you’re a data engineer building ETL pipelines, a data scientist conducting exploratory analysis, or an ML engineer developing predictive models, these features now understand what you’re working on and help you do it more efficiently. This is just the start of our agentic journey in SageMaker Unified Studio. To learn more, review the SageMaker Unified Studio User Guide. We encourage you to explore the MCP capabilities and the AWS MCP Servers repository on GitHub.


About the authors

Lauren Mullennex is a Senior GenAI/ML Specialist Solutions Architect at AWS. She has over a decade of experience in ML, DevOps, and infrastructure. She is a published author of a book on computer vision. Outside of work, you can find her traveling and hiking with her two dogs.

Siddharth Gupta is heading Generative AI within SageMaker’s Unified Experiences. His focus is on driving agentic experiences, where AI systems act autonomously on behalf of users to accomplish complex tasks. Previously, he led edge machine learning solutions at AWS. This cutting-edge work aims to revolutionize how developers and data scientists interact with AI, creating more intuitive data integrations and powerful tools for building and deploying machine learning models. An alumnus of the University of Illinois at Urbana-Champaign, he brings extensive experience from his roles at Yahoo, Glassdoor, and Twitch. You can reach out to him on LinkedIn.

Ishneet Kaur is a Software Development Manager on the Amazon SageMaker Unified Studio team. She leads the engineering team to design and build GenAI capabilities in SageMaker Unified Studio

Mohan Gandhi is a Senior Software Engineer at AWS. He has been with AWS for the last 10 years and has worked on various AWS services like Amazon EMR, Amazon EFA, and Amazon RDS. Currently, he is focused on improving the SageMaker inference experience. In his spare time, he enjoys hiking and marathons.

Mukul Prasad is a Senior Applied Science Manager in the AWS Agentic AI organization. He leads the Data Processing Agents Science team developing DevOps agents to simplify and optimize the customer journey in using AWS Big Data processing services including Amazon EMR, AWS Glue, and Amazon SageMaker Unified Studio. Outside of work, Mukul enjoys food, travel, photography, and Cricket.

Murali Narayanaswamy is a Principal Machine Learning Scientist in the Agentic AI organization in AWS working on products including Amazon Bedrock, Amazon SageMaker Unified Studio, Amazon Redshift and Amazon RDS. His research interests lie at the intersection of AI, optimization, learning and inference particularly using them to understand, model and combat noise and uncertainty in real world applications and Reinforcement Learning in practice and at scale. Broadly, he works on using ideas from online algorithms, optimization under uncertainty, control theory, game theory, artificial intelligence, graphical models and estimation theory to solve important problems at Amazon scale.

Necibe Ahat is a Senior AI/ML Specialist Solutions Architect at AWS, working with Healthcare and Life Sciences customers. Necibe helps customers to advance their generative AI and machine learning journey. She has a background in computer science with 15 years of industry experience helping customers ideate, design, build and deploy solutions at scale. She is a passionate inclusion and diversity advocate.

Vipin Mohan is a Principal Product Manager at Amazon Web Services, where he leads generative AI product strategy. He specializes in building AI/ML products, container platforms, and search technologies that serve thousands of customers. Outside of work, he mentors aspiring product managers, enjoys reading about financial investing and entrepreneurship, and loves exploring the world through the eyes of his two kids.