Janssen Pharmaceuticals Increases Machine Learning Accuracy by 21% Using Amazon SageMaker
Janssen Pharmaceuticals (Janssen), a group of pharmaceutical companies that have been a part of Johnson & Johnson since 1961, uses machine learning (ML) to better understand the patient experience of those who are on Janssen therapies. To automate the deployment workflow and create a better interface between development and production environments, Janssen data scientists used Amazon Web Services (AWS). Employing AWS services—including Amazon SageMaker, an ML service that can be used to build, train, and deploy ML models for virtually any use case—Janssen implemented an automated ML operations (MLOps) process that improved the accuracy of model predictions by 21 percent and increased the speed of feature engineering by approximately 700 percent, helping Janssen to reduce costs while increasing efficiency.
Instead of having each step of the process sequentially arranged, we could parallelize data preparation and feature engineering jobs and how they’re orchestrated by using AWS Glue and AWS Step Functions in combination.”
Principal Data Scientist, Janssen Pharmaceuticals
Searching for Automation in MLOps to Accelerate Research
The Janssen product portfolio covers a wide range of therapeutic areas, including immunology, infectious diseases, neuroscience, and oncology. “Giving patients the best care over the course of their complex treatment journeys is important, so we use artificial intelligence and machine learning to understand how people experience the therapy and to better meet patients’ needs through their disease journeys,” says Jenna Eun, principal data scientist at Janssen.
To accelerate the impact of ML-based solutions on the patient experience, the Janssen Business Technology Commercial Data Sciences team decided to focus on MLOps, a set of practices that aims to reliably and efficiently deploy and maintain ML models in production, increase automation, and fulfill business and technology requirements. “Our goal for MLOps is to facilitate experimentation and model performance tracking over time,” says Eun. “Easy experimentation and thorough exploration of the hyperparameter space are important for us to establish confidence in machine learning models.”
Janssen decided to put together a cross-functional team to create an automated MLOps process due to the importance of aligning its technology needs with internal security requirements. “Because the processes we build are using healthcare data, we have rigorous security and privacy measures that we must closely follow as we develop and implement our technology solutions,” says Eun. Beginning in late 2020, the Janssen Business Technology Commercial Data Sciences team and the Johnson & Johnson Technology CloudX team worked in collaboration alongside the Amazon SageMaker solutions architecture team and AWS Professional Services, a global team of experts who can help companies realize their desired business outcomes on AWS.
Increasing Speed and Accuracy of ML on AWS
Working alongside the Amazon SageMaker solutions architecture team and AWS Professional Services, the Janssen Business Technology Commercial Data Sciences team and the Johnson & Johnson Technology CloudX team automated data preparation and feature engineering modules in less than 3 months. Feature engineering is the process of creating input variables from patient data for the training of supervised ML models. By automating these steps, the teams were able to accelerate the speed of data preparation by approximately 600 percent and the speed of feature engineering by approximately 700 percent. Janssen accomplished this using AWS Step Functions, a low-code visual workflow service that makes it simpler to sequence the steps needed to collect, process, and normalize source data. AWS Step Functions coordinates jobs on AWS Glue, a serverless data integration service, which has a functionality to effortlessly sync development and production environments for faster deployment of experimented and optimized ML solutions. “Instead of having each step of the process sequentially arranged, we could parallelize data preparation and feature engineering jobs and how they’re orchestrated by using AWS Glue and AWS Step Functions in combination,” says Eun. “That made it simple for us to seamlessly connect the development and production environments so that whatever we’re experimenting with can be quickly converted to AWS Glue jobs, which are launched by AWS Step Functions.”
After implementing the MLOps solution on AWS, Janssen increased the accuracy of its predictive modeling by 21 percent. “Because the data pipeline is more automated and takes less time, we can devote more time to the performance of the model,” says Eun. Integral to improving the accuracy of the ML model is hyperparameter optimization. Once the Janssen team has the models and the data defined, they use Amazon SageMaker to automatically tune a model by adjusting thousands of combinations of algorithm parameters to arrive at the most accurate predictions the model can produce. That automation combined with its Bayesian optimization algorithm substantially reduces the time for a parameter search. “We feel more confident about the resulting ML model because we did a thorough hyperparameter search,” says Eun.
The Janssen team and the Johnson & Johnson Technology CloudX team were able to document this project and share it with other Johnson & Johnson teams engaged in similar ML projects. Sharing the learning helps to accelerate those projects that also need to comply with Johnson & Johnson’s security policies and to foster an MLOps culture throughout the organization. “By creating a pattern for others to follow, we demonstrated how to connect different AWS services to build an entire ML pipeline within the Johnson & Johnson environment,” says Eun. “Being able to create and increase efficiency in parts of our previous ML development and deployment process opened our eyes to the flexibility and scalability we can have.”
Improving Treatments for Patients Worldwide
Janssen’s MLOps solution makes it possible to deliver data science solutions at scale. Eun says, “As we deploy our solution in the real world and show how it can make a difference, we envision scaling to bigger geographic regions and applying it to other business use cases at Johnson & Johnson.”
About Janssen Pharmaceuticals
A Johnson & Johnson company since 1961, Janssen Pharmaceuticals is a research and development organization focused on improving patient outcomes for severe diseases across six therapeutic areas, including cardiovascular health, immunology, neuroscience, and oncology.
Benefits of AWS
- Increased speed of data preparation by approximately 600%
- Increased speed of feature engineering by approximately 700%
- Improved accuracy of ML models by 21%
- Established a standard MLOps reference architecture for other Johnson & Johnson teams
AWS Services Used
Amazon SageMaker helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML.
AWS Step Functions
AWS Step Functions is a low-code, visual workflow service that developers use to build distributed applications, automate IT and business processes, and build data and machine learning pipelines using AWS services.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.
AWS Professional Services
The AWS Professional Services organization is a global team of experts that can help you realize your desired business outcomes when using the AWS Cloud.
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.