How to set up MATLAB parallel cloud computing on AWS for researchers
For researchers, discovering an algorithm that has the potential to enhance their results is exciting. Discovering that an algorithm is too computationally expensive to be practical is frustrating. More and more, researchers are leveraging software developed by others. Since many researchers are not programmers themselves, they must find ways to use this software in the form it was made available to them.
Many researchers use MATLAB® from MathWorks, a programming and numeric computing platform, to analyze data, develop algorithms, and create models. As a researcher, you can leverage Amazon Web Services (AWS) with MATLAB to expand available computational resources right from your desktop or laptop. In this blog post, we walk through how to integrate MathWorks Cloud Center with AWS in order to accelerate scientific computation and innovation.
How to set up MathWorks Cloud Center on AWS
- A MATLAB® and Parallel Computing Toolbox™ license on your workstation
- A MATLAB Parallel Server™ license configured to use online licensing.
- An AWS account
Set up MathWorks Cloud Center for AWS walkthrough
First, authorize MathWorks Cloud Center to access your AWS account to create clusters and run batch jobs.
- Log into MathWorks Cloud Center and choose the Cloud Accounts tab at the top of the page; a dialog window may appear. If not, select the +Authorize button and choose the Follow guided steps… hyperlink in the dialog box.
Figure 1. Authorize AWS account on MathWorks Cloud Center.
- Navigate to the AWS Management Console. Log into your AWS account with permissions to create AWS Identity and Access Management (IAM) roles and policies to enable MathWorks Cloud Center to access your AWS account.
- Go to the AWS IAM service, select the Roles tab, and create a new IAM role using the blue button on the right. Choose the AWS account for the Trusted entity type and select the Another AWS account radio button. Copy the MathWorks Account ID from the MathWorks Cloud Center browser tab and paste into the Account ID box (Figure 3). Under Options, select the check box next to Require external ID, and copy the External ID from the MathWorks Cloud Center browser tab and paste into this box (depicted in Figure 2).
Figure 2. MathWorks Cloud Center Create role in AWS page.
Figure 3. Create a new IAM role with trusted entity in the AWS IAM console.
- Click Next to go to the Add Permissions page and create an IAM permission policy. Select the Create Policy button in the upper right section of the Add Permissions page; a new browser tab opens on the Create Policy Select the JSON tab near the top and delete all of the default text. Copy the custom IAM access policy document from this page and paste it into the text box on the Create policy page in the AWS Console. Add tags and provide a name for the policy, e.g. MathWorks_CloudCenter_Policy. Review and create the policy. Return to the Add Permissions browser tab and refresh (oroborus button) the policy list. Select the policy you created. Select Next to go to the Name, Review, Create page and provide the name of the role, like MathWorks_CloudCenter_Role. Review and create the role.
- The web page will refresh and you should see a green bar at the top informing you that the role was created successfully. Select View role button. Select and copy the role Amazon Resource Name (ARN) in the middle of Summary section. Return to the MathWorks Cloud Center browser tab and paste the role ARN into the text box in Step C. Select the blue Next button and your MathWorks Cloud Center should be authorized to access your AWS account.
Another place to add an AWS IAM role to MathWorks Cloud Center is through the User Preference page. Once your AWS credentials have been updated successfully, you should be able to see your AWS account ID listed on the Accounts page.
Create a MATLAB Parallel Server cluster on AWS
- Select the Cloud Resources tab near the top of the MathWorks Cloud Center page and click on +Create next to the MATLAB Parallel Server tab (not MATLAB tab). You can also check existing clusters or create a new cluster on AWS to run MATLAB parallel server.
- Provide a name for the cluster (i.e. aws).
- Select the MATLAB version that matches your desktop MATLAB installation (e.g. MATLAB R2022a).
- Keep the default option to automatically terminate the cluster when it is idle.
- Under Cluster Configuration, leave the Worker Machine Type (m5.8xlarge) and Headnode Machine Type (c5d.xlarge) to their defaults for this demo.
- Workers per Machine should be set to the number of instance cores by default (i.e. 16 for instance with 16 cores).
- Check Allow cluster to auto-resize and set the Workers in Cluster upper limit to 128. Make sure that your Amazon Elastic Compute Cloud (Amazon EC2) on-demand quota limit is sufficient for the selected instance type, otherwise request a quota limit increase on AWS.
Figure 4. Create a cluster under MathWorks Cloud Center Cloud Resources section.
- Under Cluster Shared Storage, select None for Persisted Storage for this tutorial. You may use Amazon Simple Storage Service (Amazon S3) to store large data files in your research project.
- Nothing needs to be changed under Local Machine Storage.
- Under Remote Access, select an Amazon EC2 key to connect the cluster via Secure Shell Protocol (SSH). You can create a new key on the AWS Management Console.
- Select Create Cluster. The process to launch the cloud cluster may take several minutes.
- The headnode of the cluster created should appear after a few minutes as it is provisioned into your account. Notice that no worker nodes/instances are created at this time because we selected an auto-scaling cluster.
Note: Do not modify or delete any resources that MathWorks Cloud Center creates for you (e.g. AWS CloudFormation stacks, etc.) or terminate your headnode instance on the AWS console. All of the creation and deletion should be done through MathWorks Cloud Center.
- Once the provisioning is complete, the Status in MathWorks Cloud Center will change to Online.
Figure 5. Cluster summary page on MathWorks Cloud Center.
Submit a job from MATLAB desktop application
Open MATLAB Home tab, select Parallel under the Environment section and select Discover Clusters.
Figure 6. Discover Clusters menu on a desktop MATLAB application.
Select the On MathWorks Cloud Center or the On Amazon EC2 option instead of On your network. Your AWS cluster should appear after a few seconds. Select your cluster and click the Next button and uncheck Set new cluster profile as default. Then select Finish.
To test the connection to your cloud based parallel cluster:
- Select Parallel under the Environment Home tab.
- Choose Create and Manage Clusters.
- Select the cluster profile that was created through the Discover Clusters workflow (aws) from the list on the left.
- Select Test Cloud Connectivity to verify that your MATLAB client can connect to MathWorks Cloud Center.
- Select Start.
- Optionally, validate the MATLAB Parallel Server cluster by selecting Validate. This runs a series of job tests which will also provision work node(s) (this takes around 15-20 minutes). Check this by looking back at the MathWorks Cloud Center browser tab and/or the AWS Console EC2 instance page (don’t forget to refresh).
- With Properties selected for the aws cluster profile, select Edit and enter 1 for the number of computational threads to use on each worker as recommended by MATLAB. Multi-threading jobs can run on AWS instances that have hyper-threading turned on if needed. Select Done.
Figure 7. Select Test Cloud Connectivity on Cluster Profile Manager of desktop MATLAB application.
The following MATLAB source code script (in the submit.m file) demonstrates the benefits of scaling to AWS from MATLAB. The first half of the loop supports the ability to create parallel pools beyond the local number of available workers (processor cores). The second half in batch function demonstrates that researchers can batch submit multiple jobs simultaneously, each with their own number of workers independent from the local system.
The MATLAB source code snippet above is submit.m. Its purpose is to submit computation jobs to the cloud using MATLAB’s batch feature. This example is simple, performing integer factorization. We are going to perform this computation with a varying number of workers from 1 to 128. The numWorkers array shows the number of workers we want to evaluate (you can adjust these for larger numbers of workers depending on your MathWorks Cloud Center configuration). This script will submit multiple batch jobs to the cluster, timing the runtime of integer factorization using MATLAB Timer functions (make sure your MathWorks Cloud Center cluster is in online mode; if not, start the cluster first and then run the submit.m).
You can utilize the MATLAB Job Monitor to keep track of your jobs; it will show that the jobs are either pending, queued, running, or finished on your AWS cluster. You can also do this programmatically. You can also watch the cluster create additional compute resources by watching your MathWorks Cloud Center portal. This process will offload your computations to the cloud cluster.
Figure 8. MATLAB Job monitor on desktop MATLAB application.
Figure 9. Cluster details on my clusters page of MathWorks Cloud Center.
You can retrieve results from your job using the following retrieve.m script. This script will check your AWS cluster, looking for jobs that have the status ‘finished’. Any jobs that are finished will have their outputs retrieved and output displayed in the MATLAB command window and the job is removed from cluster’s queue. This script will retrieve results from jobs as they finish, which can be in an order different from which they were submitted.
Figure 10. MATLAB Job monitor on desktop MATLAB application.
To avoid extra cost accumulated by the compute resources on AWS running MATLAB parallel cluster, follow the steps below:
- Shut down and delete the cluster on MathWorks Cloud Center Cloud resources page.
- Delete the cloud acount authorized earlier in MathWorks Cloud Center.
- Find and delete the AWS IAM role on AWS IAM management console.
This blog post illustrates a method for researchers who use MATLAB to leverage AWS to scale up their computations. Using MathWork’s Cloud Center, MATLAB users can deploy AWS more simply to keep the learning curve low, and leverage cloud elasticity to keep costs at a minimum level. This approach is motivated with a super-resolution image processing project and shows how AWS can be deployed directly from a local system running MATLAB. The technique demonstrates how computations can be both scaled and offloaded onto AWS at the code level and across computations by batching work.
Learn more about how AWS supports researchers at the AWS for Research and Technical Computing hub.
Read more about AWS for research:
- Introducing 10 minute cloud tutorials for research
- Analyze terabyte-scale geospatial datasets with Dask and Jupyter on AWS
- Cloud powers faster, greener, and more collaborative research, according to new IDC report
- Building a resilient and scalable clinical genomics analysis pipeline with AWS
- Accelerating new materials design with open data on AWS
- How researchers at UC Davis support the swine industry with data analytics on AWS
- How to set up Galaxy for research on AWS using Amazon Lightsail
Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.
Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.