AWS Machine Learning Blog

Using DeepChem with Amazon SageMaker for virtual screening

Virtual screening is a computational methodology used in drug or materials discovery by searching a vast amount of molecules libraries to identify the structures that are most likely to show the target characteristics.

It is becoming a ground-breaking tool for molecular discovery due to the exponential growth of available computer time and constant improvement of simulation.

Deep learning technologies are widely used in this computational virtual compound screening, and such technologies have evolved tremendously. DeepChem is one of the most popular open-source tools that democratizes the use of deep learning in drug discovery, materials science, quantum chemistry, and biology. For more information, see Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology on GitHub.

This post describes how to use DeepChem with Amazon SageMaker. Amazon SageMaker is a fully managed service that enables you to quickly and easily build, train, and deploy machine learning (ML) models. ML often feels harder than it should because the process to build, train, and deploy models into production is complicated and slow. Amazon SageMaker removes that complexity.

Installing DeepChem with Amazon SageMaker

To install DeepChem, set up an AWS account and create your first Amazon SageMaker notebook instance. Complete the following steps:

  1. Create an AWS account if you do not already have one.

When you sign up for AWS, your AWS account is automatically signed up for all AWS services, including Amazon SageMaker. You are charged only for the services that you use. If you are a first-time user of Amazon SageMaker, see How Amazon SageMaker Works.

  1. On the Amazon SageMaker console, choose Notebook instances.
  2. Choose Create notebook instance.
  3. Under Notebook instance settings, for Notebook instance name, enter a name for your instance. This post uses the name deepchem.
  4. For Notebook instance type, enter your preferred type. This post uses mL.t3.xlarge.
  5. For Elastic Inference, choose None.

For more information, see Step 2: Create an Amazon SageMaker Notebook Instance.

When you finish the preceding steps to create a notebook instance, Amazon SageMaker launches an ML compute instance. The notebook instance has a preconfigured Jupyter notebook server and a set of Anaconda libraries.

  1. When the notebook instance status is InService, choose Open Jupyter.

The Jupyter notebook server page appears.

  1. From the New drop-down menu, choose Terminal.

  1. Update Conda with the following code in your terminal:
conda update –n base –c defaults conda -y

You are now ready to install DeepChem.

  1. Run the following codes in your terminal:
    git clone https://github.com/deepchem/deepchem.git
    cd deepchem
    bash scripts/install_deepchem_conda.sh deepchem
    conda init bash
    source /home/ec2-user/.bashrc
    conda activate deepchem
    python setup.py install
  2. Create and check your DeepChem Notebook kernel with the following code:
    # create a Notebook kernel
    conda install jupyter
    # check your Notebook kernel
    jupyter kernelspec list

 

  1. Close the terminal and Jupyter home window.

It can take up to a few minutes for Jupyter to recreate its drop-down menu.

Testing DeepChem installation

To test your DeepChem installation, complete the following steps:

  1. On the Amazon SageMaker console, choose Open Jupyter.
  2. On the Jupyter dashboard, from the New drop-down menu, choose conda_deepchem.

To check that DeepChem installed correctly, enter the following code into the notebook cell:

# Check deepchem
import deepchem

Running DeepChem

Now you can use DeepChem with the Jupyter Notebook App on an Amazon SageMaker notebook instance. The following example Python code generates uncertainty estimates. For more information, see Uncertainty in Deep Learning. You can load a dataset, create a model, train it on the training set, and predict the output on the test set. See the following code:

import deepchem as dc
import numpy as np
import matplotlib.pyplot as plot
tasks, datasets, transformers = dc.molnet.load_sampl()
train_dataset, valid_dataset, test_dataset = datasets
model = dc.models.MultitaskRegressor(len(tasks), 1024, uncertainty=True)
model.fit(train_dataset, nb_epoch=200)
y_pred, y_std = model.predict_uncertainty(test_dataset)

You see the following output and can check the prediction results in y_pred and y_std:

Loading dataset from disk.
Loading dataset from disk.
Loading dataset from disk.

In this second example, the following code shows the best fit linear regression to a set of 10 data points:

x = np.linspace(0, 5, 10)
y = 0.15*x + np.random.random(10)
plot.scatter(x, y)
fit = np.polyfit(x, y, 1)
line_x = np.linspace(-1, 6, 2)
plot.plot(line_x, np.poly1d(fit)(line_x))
plot.show()

You can see the following graph as the result.

Conclusion

This post demonstrated how to install DeepChem to Amazon SageMaker and verify it by running tests provided on the DeepChem Github repository. You can run your own codes or experiment with further DeepChem tutorials. For more information, see Tutorials on the DeepChem Github repository. The tutorials show off various aspects or capabilities of DeepChem. They can be run interactively in Jupyter (IPython) notebook. You can also download the notebook files and open them in the Amazon Sagemaker.

If you have any questions, please leave them in the comments.


About the Author

Seongik Hong is a cloud infrastructure architect on the shared delivery teams. In his role, he leverages his experience to help people bring their ideas to life utilizing services provided by AWS. In his spare time, he enjoys experimenting how cloud computing can help cheminformatics.