An introduction to AWS for research IT: Getting started in the cloud
The cloud can help researchers process complex workloads, store and analyze enormous amounts of data, collaborate globally, and accelerate research and innovation. For research IT, Amazon Web Services (AWS) can help build scalable, cost-effective, and flexible environments while still maintaining the governance and guardrails for security and compliance. Following best practices, AWS allows for centralized management of resources, improved security and compliance of research workloads, and can save costs and accelerate innovation. What are some common questions from research IT customers?
Getting started: How do I deal with account creation and management?
AWS accounts can be created as standalone accounts or as a part of AWS Organizations, which allows you to create groups of AWS accounts with centralized policy management, governance and consolidated billing. With Organizations, accounts can be grouped based on different strategies such as billing, security, data management, teams or departments, research workloads, and more.
If you have an existing AWS Organization and want to include a new AWS account as part of the organization, in the AWS Organization parent account, you can create a new AWS account or you may invite an existing AWS account to become a member of the organization.
For account management, customers can choose to develop their own AWS Landing Zone or use AWS Control Tower to set up and manage multiple AWS accounts. The AWS Account Vending Machine (AVM) in Landing Zone or the Account Factory in AWS Control Tower can create or enroll new accounts that automatically have account baselining, network baseline, and centralized logging enabled.
Figure 1 and 2 show examples of AWS multi-account structures. Figure 1 is based on an AWS Organization structure built specifically for research accounts. Figure 2 is an AWS Organization structure consisting of AWS accounts for a university/campus, which includes central IT, department accounts, and research accounts.
Figure 1 is a sample architecture for a multi-account organizational structure using Control Tower for research accounts. The organization structure consists of a core organizational unit (OU) with log archive, security/audit account and network accounts. The organization has several other OUs consisting of AWS accounts for researchers based on data sensitivity, compliance, and public access. This helps to group accounts under specific OUs to centrally manage and provide required governance, security, and compliance. This account structure can be beneficial for customers who want to create and manage all their research workloads in different AWS accounts under one AWS Organization and manage billing and governance related to research workloads only.
Figure 2 shows a sample organization structure consisting of multiple AWS accounts in the university/campus, which includes central IT, department accounts, and research accounts. Each department can have their own OU and there can be several research OUs similar to as shown in Figure 1.1. This account structure architecture provides a centralized management and governance for multiple AWS accounts across the university/campus and allows them interact with the core OU accounts running services like authentication, monitoring, networking centrally for all the AWS accounts under this AWS Organizations.
Researchers benefit from having centralized governance, security, and networking being under an organization and also from enterprise discounts, volume discounts, discounts on data egress, and more, based on the requirements and conditions.
In scenarios where the researcher already has their own standalone AWS account, research IT can request cross account access via IAM roles to gain access to the researcher AWS account to help with configuring or deploying AWS services and resources.
Learn more about account details.
How do I take advantage of AWS compute for things like high performance computing or machine learning?
You can create compute infrastructure such as Amazon Elastic Compute Cloud (Amazon EC2) instances and containers specific to your workload within minutes. There are wide selection of Amazon EC2 instance types, like compute optimized, memory optimized, accelerated computing with GPU, Field Programmable Gate Array (FPGA), and other options that give the flexibility to choose based on the use case requirement and workflow.
Researchers can get instant access to deep learning tools by quickly deploying Amazon EC2 instances using the AWS Deep Learning AMI, which comes pre-installed with deep learning frameworks and interfaces such as TensorFlow, PyTorch, Apache MXNet, Chainer, Gluon, Horovod, and Keras.
For high performance computing (HPC) workloads and machine learning (ML) applications requiring high-data throughput and high-network bandwidth, researchers can launch Amazon EC2 instances with Elastic Fabric Adapter (EFA) using AWS ParallelCluster, which gives high performance in your cluster at scale.
AWS also provides container services with Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), AWS Fargate, and serverless compute in the form of AWS Lambda, so you don’t need to manage infrastructure if you don’t want to.
How can I connect to the cloud and set up networking?
Deploy a virtual network boundary in AWS using the Amazon Virtual Private Cloud (Amazon VPC), which hosts the instances and resources that are used to run your application/research. The VPC can be seen as a virtual data center in the cloud that can be connected with your resources running on premises in your campus data center, office, or internet. This virtual network closely resembles a traditional network that you’d operate in your own data center, with the benefits of using the scalable infrastructure of AWS. Amazon VPC helps provide the connectivity and security to your resources
Depending on the traffic and connectivity, a VPC can be connected to on-premises infrastructure on campus by IPSec VPN or AWS Direct Connect. Research accounts mostly use Direct Connect or REN (research and education networks) to connect on-campus and their AWS resources for large data transfers.
What storage options should I consider?
Researchers use Amazon Simple Storage Service (Amazon S3) for storing data and for collaborating by sharing research data securely. Amazon S3 is a low-cost, secure, durable, and highly available object storage with unlimited storage. Files can be transferred or accessed in Amazon S3 using standard Linux commands, Powershell, AWS Command Line Interface (CLI), AWS Console, SDKs, file transfer applications like Cyberduck, and other tools.
If running data intensive applications with high computations and requiring a file system, use Amazon Elastic Block Store (Amazon EBS) or Amazon Elastic File System (Amazon EFS). These offer single digit-millisecond latency and are recommended for high performance and high throughput applications.
How can AWS help with security and compliance?
At AWS, security is the top priority. Likewise, for many of our customers, security is the most important aspect and must be built into every layer and level of the environment. AWS enables customers to securely deploy resources with strong isolation control and provides a number of tools
At AWS, we use a shared responsibility model. With the shared responsibility model, AWS is responsible for the security of the cloud and the customer is responsible for security in the cloud. Each security or compliance standard has responsibilities and controls that is shared between AWS and the customer.
On-demand access to AWS security and compliance attestation and artifacts as well as select online agreements can be downloaded using AWS Artifact from the AWS Console. Learn more about AWS compliance programs.
AWS customers in healthcare, life sciences, and research use AWS for HIPAA, HITECH, and PHI workloads because of the flexibility, security, and compliance programs offered with AWS. AWS provides resources to guide customers with best practices and architecture in terms of documentation, webinars, and solutions architects.
Learn more about AWS cloud security.
What can I do to manage my cloud costs?
In the cloud, you are charged based on the utility model. You pay for what you use and are no longer charged when you stop using your resources. Estimate what should be your spend and AWS cost management tools will help calculate, monitor, optimize, and set budgets based on your usage. All AWS services pricing information is provided in detail in the service documentation. To estimate your costs, use the AWS Pricing Calculator. Tag all your resources appropriately to help identify and allocate costs.
Take advantage of AWS research funding programs like AWS Cloud Credits for Research, AWS Machine Learning Research Awards, and Amazon Research Awards to fund research, share knowledge, and encourage innovation. Funds that are granted for a specific researcher or project needs to be allocated towards the costs associated by the AWS account being used by the researcher or for the project. Sometimes grants allocated by AWS research programs are in the form of AWS Promotional Credits. The credit sharing preferences needs to be adjusted to benefit the researcher or the project who the credits were granted. AWS customers that work in academic or research institutions are eligible for the AWS Global Data Egress Waiver. If using grants or credits, take into consideration other payment options once the grants or credits have been utilized to fund the research project.
Learn more on use cases and resources on AWS Cost Management.
What is the benefit of managed services?
AWS offers services and tools that reduce the overhead of managing and maintaining the infrastructure and are highly available and scale as per your needs. To plan, schedule, and execute your batch computing workloads across AWS compute services and features, such as Amazon EC2 and Spot Instances, use AWS Batch. For ML, Amazon SageMaker comes with integrated tools, ML workflows, and Jupyter notebooks, providing the components used for ML in a single toolset so models get to production faster with much less effort and at lower cost. AWS ParallelCluster, an AWS-supported open source cluster management tool, supports a variety of job schedulers such as AWS Batch, SGE, Torque, and Slurm for easy job submissions. For quantum computing computations, Amazon Braket provides a development environment for you to explore and build quantum algorithms, test them on quantum circuit simulators, and run them on different quantum hardware technologies.
Is there anywhere I can go for training specifically for research IT?
See below for resources for researchers and research IT to continue to learn about cloud computing:
Learn more about research and technical computing on AWS, and check out no-cost online AWS training pathway for researchers and research IT and the research seminar series.
Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.