AWS Machine Learning Blog
Share medical image research on Amazon SageMaker Studio Lab for free
This post is co-written with Stephen Aylward, Matt McCormick, Brianna Major from Kitware and Justin Kirby from the Frederick National Laboratory for Cancer Research (FNLCR).
Amazon SageMaker Studio Lab provides no-cost access to a machine learning (ML) development environment to everyone with an email address. Like the fully featured Amazon SageMaker Studio, Studio Lab allows you to customize your own Conda environment and create CPU- and GPU-scalable JupyterLab version 3 notebooks, with easy access to the latest data science productivity tools and open-source libraries. Moreover, Studio Lab free accounts include a minimum of 15 GB of persistent storage, enabling you to continuously maintain and expend your projects across multiple sessions and allowing you to instantly pick up where your left off and even share your ongoing work and work environments with others.
A key issue faced by the medical image community is how to enable researchers to experiment and explore with these essential tools. To solve this challenge, AWS teams worked with Kitware and Frederick National Laboratory for Cancer Research (FNLCR) to bring together three major medical imaging AI resources for Studio Lab and the entire open-source JupyterLab community:
- MONAI core, an open-source PyTorch library for medical image deep learning
- Clinical data from The Cancer Imaging Archive (TCIA), a large, open-access database of medical imaging studies funded by the National Cancer Institute
- itkWidgets, an open-source Jupyter/Python library that provides interactive, 3D medical image visualizations directly within Jupyter Notebooks
These tools and data combine to allow medical imaging AI researchers to quickly develop and thoroughly evaluate clinically ready deep learning algorithms in a comprehensive and user-friendly environment. Team members from FNLCR and Kitware collaborated to create a series of Jupyter notebooks that demonstrate common workflows to programmatically access and visualize TCIA data. These notebooks use Studio Lab to allow researchers to run the notebooks without the need to set up their own local Jupyter development environment—you can quickly explore new ideas or integrate your work into presentations, workshops, and tutorials at conferences.
The following example illustrates Studio Lab running a Jupyter notebook that downloads TCIA prostate MRI data, segments it using MONAI, and displays the results using itkWidgets.
Although you can easily carry out smaller experiments and demos with the sample notebooks presented in this post on Studio Lab for free, it is recommended to use Amazon SageMaker Studio when you train your own medical image models at scale. Amazon SageMaker Studio is an integrated web-based development environment (IDE) with enterprise-grade security, governance, and monitoring features from which you can access purpose-built tools to perform all ML development steps. Open-source libraries like MONAI Core and itkWidgets also run on Amazon SageMaker Studio.
Install the solution
To run the TCIA notebooks on Studio Lab, you need to register an account using your email address on the Studio Lab website. Account requests may take 1–3 days to get approved.
After that, you can follow the installation steps to get started:
- Log in to Studio Lab and start a CPU runtime.
- In a separate tab, navigate to the TCIA notebooks GitHub repo and choose a notebook in the root folder of the repository.
- Choose Open Studio Lab to open the notebook in Studio Lab.
- Back in Studio Lab, choose Copy to project.
- In the new JupyterLab pop-up that opens, choose Clone Entire Repo.
- In the next window, keep the defaults and choose Clone.
- Choose OK when prompted to confirm to build the new Conda environment (
medical-image-ai
).
Building the Conda environment will take up to 5 minutes. - In the terminal that opened in the step before, run the following command to install NodeJS in the
studiolab
Conda environment, which is required to install the ImJoy JupyterLab 3 extension next:conda install -y -c conda-forge nodejs
We now install the ImJoy Jupyter extension using the Studio Lab Extension Manager to enable interactive visualizations. The Imjoy extension allows itkWidgets and other data-intensive processes to communicate with local and remote Jupyter environments, including Jupyter notebooks, JupyterLab, Studio Lab, and so on. - In the Extension Manager, search for “imjoy” and choose Install.
- Confirm to rebuild the kernel when prompted.
- Choose Save and Reload when the build is complete.
After the installation of the ImJoy extension, you will be able to see the ImJoy icon in the top menu of your notebooks.
To verify this, navigate to the file browser, choose the TCIA_Image_Visualalization_with_itkWidgets
notebook, and choose the medical-image-ai
kernel to run it.
The ImJoy icon will be visible in the upper left corner of the notebook menu.
With these installation steps, you have successfully installed the medical-image-ai
Python kernel and the ImJoy extension as the prerequisite to run the TCIA notebooks together with itkWidgets on Studio Lab.
Test the solution
We have created a set of notebooks and a tutorial that showcases the integration of these AI technologies in Studio Lab. Make sure to choose the medical-image-ai
Python kernel when running the TCIA notebooks in Studio Lab.
The first SageMaker notebook shows how to download DICOM images from TCIA and visualize those images using the cinematic volume rendering capabilities of itkWidgets.
The second notebook shows how the expert annotations that are available for hundreds of studies on TCIA can be downloaded as DICOM SEG and RTSTRUCT objects, visualized in 3D or as overlays on 2D slices, and used for training and evaluation of deep learning systems.
The third notebook shows how pre-trained MONAI deep learning models available on MONAI’s Model Zoo can be downloaded and used to segment TCIA (or your own) DICOM prostate MRI volumes.
Choose Open Studio Lab in these and other JupyterLab notebooks to launch those notebooks in the freely available Studio Lab environment.
Clean up
After you have followed the installation steps in this post and created the medical-image-ai
Conda environment, you may want to delete it to save storage space. To do so, use the following command:
conda remove --name medical-image-ai --all
You can also uninstall the ImJoy extension via the Extension Manager. Be aware that you will need to recreate the Conda environment and reinstall the ImJoy extension if you want to continue working with the TCIA notebooks in your Studio Lab account later.
Close your tab and don’t forget to choose Stop Runtime on the Studio Lab project page.
Conclusion
SageMaker Studio Lab is accessible to medical image AI research communities at no cost and can be used for medical image AI modeling and interactive medical image visualization in combination with MONAI and itkWidgets. You can use the TCIA open data and sample notebooks with Studio Lab at training events, like hackathons and workshops. With this solution, scientists and researchers can quickly experiment, collaborate, and innovate with medical image AI. If you have an AWS account and have set up a SageMaker Studio domain, you can also run these notebooks on Studio using the default Data Science Python kernel (with the ImJoy-jupyter-extension
installed) while selecting from a variety of compute instance types.
Studio Lab also launched a new feature at AWS re:Invent 2022 to take the notebooks developed in Studio Lab and run them as batch jobs on a recurring schedule in your AWS accounts. Therefore, you can scale your ML experiments beyond the free compute limitations of Studio Lab and use more powerful compute instances with much bigger datasets on your AWS accounts.
If you’re interested in learning more about how AWS can help your healthcare or life sciences organization, please contact an AWS representative. For more information on MONAI and itkWidgets, please contact Kitware. New data is being added to TCIA on an ongoing basis, and your suggestions and contributions are welcome by visiting the TCIA website.
Further reading
- Now in Preview – Amazon SageMaker Studio Lab, a Free Service to Learn and Experiment with ML
- Amazon SageMaker Studio Lab continues to democratize ML with more scale and functionality
- Run notebooks as batch jobs in Amazon SageMaker Studio Lab
About the Authors
Stephen Aylward is Senior Director of Strategic Initiatives at Kitware, an Adjunct Professor of Computer at The University of North Carolina at Chapel Hill, and a fellow of the MICCAI Society. Dr. Aylward founded Kitware’s office in North Carolina, has been a leader of several open-source initiatives, and is now Chair of the MONAI advisory board.
Matt McCormick, PhD, is a Distinguished Engineer at Kitware, where he leads development of the Insight Toolkit (ITK), a scientific image analysis toolkit. He has been a principal investigator and a co-investigator of several research grants from the National Institutes of Health (NIH), led engagements with United States national laboratories, and led various commercial projects providing advanced software for medical devices. Dr. McCormick is a strong advocate for community-driven open-source software, open science, and reproducible research.
Brianna Major is a Research and Development Engineer at Kitware with a passion for developing open source software and tools that will benefit the medical and scientific communities.
Justin Kirby is a Technical Project Manager at the Frederick National Laboratory for Cancer Research (FNLCR). His work is focused on methods to enable data sharing while preserving patient privacy to improve reproducibility and transparency in cancer imaging research. His team founded The Cancer Imaging Archive (TCIA) in 2010, which the research community has leveraged to publish over 200 datasets related to manuscripts, grants, challenge competitions, and major NCI research initiatives. These datasets have been discussed in over 1,500 peer reviewed publications.
Gang Fu is a Healthcare Solution Architect at AWS. He holds a PhD in Pharmaceutical Science from the University of Mississippi and has over ten years of technology and biomedical research experience. He is passionate about technology and the impact it can make on healthcare.
Alex Lemm is a Business Development Manager for Medical Imaging at AWS. Alex defines and executes go-to-market strategies with imaging partners and drives solutions development to accelerate AI/ML-based medical imaging research in the cloud. He is passionate about integrating open source ML frameworks with the AWS AI/ML stack.