Accelerate medical research with Amazon OpenSearch Service
In the era of large volumes of dynamic data, researchers need a platform that will allow them to ingest data, search in near real-time, and visualize data by using dashboards. Utilizing the Amazon OpenSearch Service to store data, conduct near real-time searches, and use the OpenSearch Service dashboard to display data is one option to expedite the process.
In the field of medical research, time is of the essence. The faster researchers can identify potential new treatments for diseases, the better. However, working with the IT team on basic duties like data intake and creating a dashboard for data insights not only adds to the IT team’s workload but also slows down medical research. The back and forth communication can take up a significant amount of time.
In recent years, medical research has made great strides in developing new treatments and cures for a variety of diseases. However, the process of conducting research is often slow and painstaking, as it can take weeks or even months to collect and analyze data. Fortunately, there are ways to speed up the process by leveraging Amazon Web Services (AWS).
Medical research can be accelerated by using Amazon OpenSearch Service (OpenSearch Service) with integrated dashboards. This service provides researchers the ability to upload data on-demand, search for information with near real-time results, and create dashboards with in-depth data analysis to gain insights.
By analyzing large datasets, researchers can identify patterns and trends that can be used to formulate new hypotheses. This approach can help accelerate the research process by reducing the amount of time spent on manual data collection and analysis.
Figure 1 – High-level architecture of the data ingestion and analytics solution
Researchers can quickly search through vast amounts of medical data using the OpenSearch service to find the information they need. Researchers can use the OpenSearch Service dashboard to create visualizations and other analyses of medical data that help gain insights and make discoveries.
The design requires the development of a portal for researchers to use in order to import massive amounts of data into OpenSearch Service whenever they need to. The researcher’s portal reduces the amount of time for coordination between the IT team and the resources needed to share large files with the IT team for data ingestion.
We divided the architecture into sections based on component functionality for clarity.
Section 1: The element of a solution that researchers engage with is called a website or researcher portal. Researchers can safely log into the website, ingest data into OpenSearch Service, and conduct searches in near real-time. Section 1’s use of AWS services:
- Amazon Route 53 (Route 53) provides highly available and scalable Domain Name System (DNS), domain name registration, and health-checking web services. Route 53 connects researchers’ web requests to other components of the solutions running on AWS.
- AWS WAF is utilized to provide protection for the solution against common web exploits and bots that have the potential to compromise security, reduce the solution’s availability, or consume an excessive number of resources.
- Amazon CloudFront provides researchers the benefit of a faster distribution of both static and dynamic web content. This includes image files as well as .html, .css, and .js files. Amazon CloudFront is responsible for delivering the contents of the researcher’s portal to researchers. Amazon CloudFront reduces the amount of time it takes for researchers to download large datasets by utilizing the AWS backbone network.
- Amazon Simple Storage Service (Amazon S3) is a service that is utilized to store application content, such as media files and static assets. This helps reduce the amount of traffic that is sent to the application web server. This service is reliable and highly available.
Section 2: Ingest API and Search API computing infrastructure controls how researchers engage with the underlying data. This can involve things like authenticating users, managing sessions, granting access to data stored in OpenSearch Service clusters, and granting programmatic access to ingest data into an OpenSearch Service cluster. Section 2’s use of AWS services:
- Amazon API Gateway (API Gateway) is an AWS service for creating, publishing, maintaining, monitoring, and securing HTTP API’s at scale. API Gateway secures and scales the researcher’s portal’s consumption of Ingest and Search APIs (container-based APIs running on Amazon Elastic Container Service).
- Amazon Cognito provides the researcher’s portal with authentication, authorization, and user management. Researchers have the option of logging in directly with a username and password or by using a third-party service such as Facebook, Amazon, Google, or Apple.
- The combination of an Application Load Balancer (ALB) and AWS Auto Scaling enhances application responsiveness, availability, and user experience by distributing traffic across multiple targets, and scaling up or down based on the traffic pattern.
- The solution uses AWS Fargate (Fargate) with Amazon Elastic Container Service (Amazon ECS) to run containers without managing servers. The Search API and Ingest API should be configured to operate as containers. By removing the operational burden of scaling, patching, securing, and managing servers, Fargate helps the infrastructure team save time. Fargate also reduces server costs by scaling compute to match resource needs, thereby eliminating over-provisioning.
Section 3: OpenSearch Service enables researchers to ingest, secure, search, aggregate, view, and analyze data. The OpenSearch Service Project continues to provide a secure, high-quality search and analytics suite with a rich roadmap of new and innovative functionality. OpenSearch Service offers different instance sizes to meet computing needs, storage tiers to reduce analytics costs, and features to increase productivity when compared to self-managed solutions.
Section 4: Continuous Integration/Continuous Deployment (CI/CD) is a software development practice in which code changes are automatically built, tested, and deployed to production, making the release process more efficient and reliable. Section 4’s use of AWS services:
- AWS CodePipeline (CodePipeline): When code changes, CodePipeline automates the build, test, and deployment of the researcher’s portal, Search API, and Ingest API according to the workflow.
- Amazon Elastic Container Registry (Amazon ECR): The images for the Search API and Ingest API are hosted on Amazon ECR. The straightforward integration of Amazon ECR and Amazon ECS allows teams to focus on building the applications, not the environment.
- AWS CodeBuild (CodeBuild) compiles source code, runs tests, and produces packages that are ready to deploy.
- AWS CodeDeploy (CodeDeploy) facilitates releasing new features, avoiding deployment downtime, updating the researcher’s portal, and Ingest API and Search API changes.
Integration with a GitHub repository enables tracking of changes and automatic deployment, testing, and release of modifications to the researcher’s portal, Search API, and Ingest API. To host source code, another option is AWS CodeCommit (CodeCommit). CodeCommit is a secure, highly scalable, fully managed source control service that hosts private GitHub repositories.
Built with Data Security in Mind
When working with sensitive medical information, it is standard practice to encrypt data both in transit and at rest. Encrypt data at rest using customer managed keys for least-privileged access controls and AWS Key Management Service (AWS KMS). AWS KMS lets you create, manage, and control cryptographic keys across your applications and AWS services.
The solution uses Amazon Cognito for user management and authentication to secure the HTTP API in API Gateway. User accounts are stored in Amazon Cognito. The site administrator has control over who can perform OpenSearch Service searches against which indexes and who can ingest data.
It is recommended to launch OpenSearch Service domains into an Amazon Virtual Private Cloud (Amazon VPC). Placing an OpenSearch Service domain within a Amazon VPC enables secure communication between OpenSearch Service and other services within the Amazon VPC without the need for an internet gateway, NAT device, or VPN connection. All traffic remains securely within the AWS Cloud. Application load balancer is internal and can only be connected by API Gateway using a VPC link. Services in Amazon ECS are deployed in a private subnet and configured to be accessed using specific ports from application load balancers.
Data insights can play a critical role in accelerating medical research. By using data to generate hypotheses, conducting near real-time experiments, researchers can save time and resources while still being able to produce accurate results. These approaches can help speed up the process of developing new treatments and cures for a variety of diseases.
Amazon OpenSearch Service, combined with supporting AWS services, can help researchers accelerate their work by making it straightforward to ingest large amounts of data on-demand and query it to find the information they need. With OpenSearch Service dashboards, researchers can quickly visualize their data to find trends and relationships that would otherwise be difficult to see.