Running Dicoogle, an open source PACS solution, on AWS (part 1)
This blog is the first part of a two-part series that describes how to host a secure DICOM server on AWS. It is based on the Dicoogle open source software, which provides the functionality of a PACS (picture archiving and communication system). A PACS stores and indexes DICOM medical image files, and uses the DICOM protocol to facilitate the upload, download, and search of DICOM studies.
The first part includes a brief introduction on DICOM, DICOM functionality, additional Dicoogle functionality, networking, and security, as well as an overview of AWS services used in the solution.
The second part includes a detailed walkthrough of solution deployment on AWS and step-by-step testing.
The solution architecture consists of the following components:
- Dicoogle, as DICOM server, which provides service to its DICOM client.
- Dicoogle server, which stores DICOM images in an elastic file system and is hosted as a containerized application in AWS.
- Dicoogle server, which exposes a network service responsible for implementing the native DICOM DIMSE (DICOM message server element protocol) and provids the services for the upload, download, and search of DICOM studies.
- Bulk upload of DICOM files independent of the DIMSE protocol, which is activated by file transfers that use native AWS services and file indexing that use a Dicoogle server.
- Administration of the Dicoogle application, which is performed through a browser and standard HTTPS connection.
- Encryption in transit of DICOM DIMSE operations through the use of TLS tunnels.
- Client functionality, which is provided by the dcmtk DICOM software as an example of a standard DICOM node and is usually located on the user’s premise.
DICOM is a data model organized in the sequence Patient – Study – Series – Instance. A patient has one or more studies, which may also be known as exams or procedures. A study has one or more series performed on the same modality (for example, a scanner). A series has one or more instances with each instance representing a 2D image or slice of a 3D image. Each DICOM image contains a header section that stores metadata (UID and text description) about a patient, study, series, and instance. You can find detailed information about the dictionary here.
DICOM software needs to maintain and index the hierarchy from individual files stored on disk. Such indexing functionality can be fulfilled by using a relational database. In the case of Dicoogle, Dicoogle has a plugin architecture that supports a default Apache Lucene based indexer, as well as the possibility of relational database based indexer.
DICOM has the concept of service class user (SCU) and service class provider (SCP). Two DICOM nodes can form an association when one node plays the role of SCU and the other plays the role of SCP. SCU invokes operation (for example, a transfer image) and SCP performs operation. DIMSE is the DICOM protocol responsible for transferring images between two DICOM nodes.
In this blog, we’ll use three of DIMSE services: C-FIND, C-STORE, and C-MOVE. You can find a detailed definition of each here.
Think of C-FIND as performing search. In return, we get a list of images that match the query criteria. Imagine C-STORE as uploading images from a DICOM client to DICOM server and asking the DICOM server to store the images. Think of C-MOVE as a DICOM client asking DICOM server to download images, series, or a study to a destination DICOM node. As a prerequisite for C-MOVE, the destination DICOM node needs to be configured as a destination in the DICOM server.
While DIMSE is the traditional DICOM protocol, DICOMweb is a new protocol developed for web-based medical imaging. DICOMweb has a set of RESTful services that fulfill the function of querying, retrieving, and storing images. DICOMweb is supported by Discoogle and allows developers to use familiar web-based tools to interact with DICOM server. Please note that we will not cover DICOMweb in this blog.
User can use the dcmtk DICOM client software to send DICOM commands to the Dicoogle server using the DIMSE protocol. The client software accesses DICOM files on the user’s local filesystem and from there can do three things:
- Send those files to the Dicoogle server using a C-STORE command
- Receive and store DICOM images using a C-MOVE command
- Search the Dicoogle database using a C-FIND command.
The Dicoogle server stores individual DICOM files on the filesystem and uses an indexer to store the relationship between the files and the DICOM series and study where they belong. DICOM images received using the DIMSE protocol are indexed and stored upon receipt.
Additional Dicoogle functionality
AWS API functions perform bulk uploads of DICOM files from the client filesystem to the filesystem of the Dicoogle server. After being stored on the Dicoogle filesystem, the Dicoogle server has to index the files before they can be accessed.
The administrator can interact directly with the Dicoogle server using its web server through HTTPS. This provides a graphical interface where the DICOM data store can be examined, and is used to control actions such as building an index of files received directly on the filesystem.
The above DICOM functionality and additional Dicoogle functionality are illustrated in diagram below:
Now that you know the functionality the solution provides, let’s expand and explain how we address networking in the solution.
First, let’s look at the networking aspect of the solution that fulfills DICOM functionality.
Imagine you’re a PACS service provider. You choose Dicoogle as a DICOM server (provider) and deploy it in a Virtual Private Cloud (VPC) on AWS to provide controlled network access to the server. Access to the server is provided to allow access to Dicoogle’s web console to perform administrative activities.
A customer is interested in consuming your PACS service from its on-premises environment. You can allocate a public IP for your service, expose service to your customer, and let your customer connect to your service through public Internet as illustrated in diagram below. Please note that this may pose security concerns, so we don’t recommend this approach.
Instead, you can offer your PACS service in a more secure and private way to your customer by using AWS Privatelink to expose Dicoogle as a VPC endpoint service. Your customer can set up their own VPC on AWS and from there, can use AWS Direct Connect (DX) or Site to Site VPN to connect from on-premises to AWS. Then, your customer can create a VPC endpoint in their own VPC on AWS to talk to Dicoogle.
Note that in this blog, we’re not going to create a real on-premises environment or set up DX/VPN. Instead, we’ll create a separate VPC on AWS to simulate on-premises environment that hosts DICOM client. We’ll use VPC peering to establish connectivity between the VPC hosting DICOM client and the VPC hosting VPC endpoint.
Once the networking connectivity is established, your customer can then use DICOM client software to send images over the connection to Dicoogle. This essentially performs the C-STORE operation described in the ‘DICOM Introduction’ section above. Your customer can also perform C-FIND operation over the same connection.
The C-MOVE operation is used to transfer DICOM studies, series, or instances from one DICOM node to another. It requires that the IP address of the destination node be preconfigured in the originating node as a destination.
C-MOVE is a two-step operation. In the first step, DICOM client node initiates a transfer that sends a C-MOVE request to the originating node (which, in this case, is Dicoogle). This indicates that both the requested entity (the study, series, or instances) and the destination should be sent. In the second step, the originating node (Dicoogle) sends the requested entity to the destination. Your customer can perform the first step of C-MOVE using the same connection illustrated in the diagram above.
Let’s have a look how connection in the second step can be established.
Assuming the destination DICOM node is in your customer’s on-premises environment, your customer can create a VPC endpoint service in their own VPC and point to the destination DICOM node. Then you can create a VPC endpoint in your VPC and configure Dicoogle to use the VPC endpoint as a destination in C-MOVE.
Note that in this blog, we’re not going to set up this reverse privatelink connection. Instead, we put the destination DICOM node in the VPC that simulates an on-premises environment. We then set up a VPC peering between the VPC that hosts Dicoogle and the VPC that simulates an on-premises environment. From there, we configure Dicoogle to use the private IP of the destination DICOM node as destination.
Now let’s look at networking from the perspective of fulfilling an additional Dicoogle functionality.
For bulk upload of images, you’ll need to create an AWS S3 bucket and grant customer access. You customer can create an interface VPC endpoint in its VPC and upload images from a source in an on-premises environment to the S3 bucket using the interface VPC endpoint.
Note that in this blog, since we’re going to use a VPC to simulate an on-premises environment, we’ll use a gateway VPC endpoint to talk to S3 service, as illustrated in the diagram below. For more information about interface VPC endpoint and gateway VPC endpoint for S3, please see resources here.
For images with limited bandwidth from customer’s premises to AWS, consider using AWS Snow family as an alternative approach.
Once images are in the bucket, use AWS Datasync service to transfer images from the S3 bucket to elastic file system used by Dicoogle. From there, login to Dicoogle to perform indexing. Once Dicoogle indexes the images, your customer can then perform DICOM C-FIND/C-MOVE operations.
For administrating Dicoogle application in your VPC, create two network segments (subnets): one for internet facing components (Application load balancer), and the other for private components (Dicoogle application, elastic file system). Administrators can connect to the Application load balancer which then directs traffic to Dicoogle application.
Security is our top priority at AWS. Let’s take a look how we address security in the solution.
Network control: In the architecture diagram below, we create three separate VPCs. The first is labeled Simulated OnPrem to host compute resources simulating DICOM client node and destination node. The second is labeled Consumer, and it hosts a VPC endpoint. The third is labeled Provider, and it hosts Dicoogle. In the three VPC, we put security groups around compute, elastic file system, and VPC endpoint resources to restrict traffic from and to intended sources and destinations. In the S3 bucket, we also define policy to restrict access from intended sources only.
Access control: We use AWS native service (Cognito) to authenticate Dicoogle administrators. AWS identity and access management (IAM) service authenticates and authorizes interactions among AWS services.
Secrets management: We use AWS native service (Secrets Manager) to store SSL certificates as secrets.
Data encryption at rest: We activate encryption at rest for data stores on AWS using Simple storage service (S3) and Elastic file system (EFS). We use AWS native service (KMS) for encryption/decryption key management.
Data encryption in transit: We use AWS native service (Certificate Manager) to issue a certificate to the internet facing application load balancer (ALB). The opensource version of Dicoogle doesn’t provide out-of-box TLS capability. We use two additional opensource tools, nginx and ghostunnel, to add TLS capability to Dicoogle. Similarly, we use ghostunnel to add TLS capability to DICOM client software where it is required (for example, dcmsend in dcmtk).
AWS Services used when hosting a secure DICOM server on AWS
The solution is built upon the following AWS services.
- Amazon Virtual Private Cloud (Amazon VPC): Provide network isolation for service provider and consumer
- Elastic Load Balancing (ELB): Consists of two types of ELB in the solution: Application Load Balancer (ALB) and Network Load Balancer (NLB). The ALB distributes administrative user traffic to Dicoogle application. The NLB distributes DIMSE traffic through interface VPC endpoint to Dicoogle application exposed as VPC endpoint services
- Amazon Cognito: Integrates with ALB to provide identity management for authenticating Dicoogle administrators
- Amazon Elastic Compute Cloud (EC2): Serves as compute instances. The solution uses two EC2 instances. The first one (client EC2) is to simulate DICOM client node who initiates the C-FIND, C-STORE, and C-MOVE requests. The second one (storage EC2) is to simulate DICOM destination node who receives requested entity as a C-MOVE destination
- AWS Fargate: Hosts docker containers that run Dicoogle, Nginx, and Ghostunnel. The solution uses one Nginx container as a reverse and TLS termination proxy to facilitate encryption in transit for traffic between ALB and Fargate. The solution uses two Ghostunnel containers. The first one runs as reverse and TLS termination proxy to facilitate encryption in transit for uploading images from client EC2 instance to Fargate. The second one runs as forward proxy to facilitate encryption in transit for transferring images from Fargate to storage EC2 instance
- Amazon Elastic Container Registry (ECR): Provides a repository to host docker images
- Amazon Elastic File System (EFS): Serves as a file system for Dicoogle configurations, images and indexes used by Dicoogle application
- Amazon Simple Storage Service (S3): Uses two S3 buckets; one serves as an intermediary store to receive bulk upload of DICOM images and the second stores ELB access log and VPC flow log
- AWS DataSync: Transfers image files from S3 to EFS
- Amazon Route53: Hosts DNS records in hosted zones and perform DNS resolution
- Amazon Cloudwatch: Integrates with services used in the solution to manage metrics and logs
- AWS Certificate Manager (ACM): Issues certificate used by ALB
- AWS Key Management Service (KMS): Manages keys used for encryption/decryption
- AWS Secrets Manager: Manages SSL certificates stored as secrets
- AWS Identify and Access Management (IAM): Provides identity and access control for services used in the solution
In addition, the following supplemental AWS services facilitate solution deployment.
- AWS Cloud9: Serves as an IDE to prepare deployment artifacts
- AWS CloudFormation: Orchestrates solution deployment
- AWS Lambda: Serves as a custom resource during cloudformation stack creation to get prefix list associated with S3
Below is the architecture diagram:
In this post, I started with a brief introduction on DICOM, the DICOM functionality and the additional Dicoogle functionality the solution provides. Then, I explained the networking and security design considerations within the solution. Lastly, I illustrated the AWS services used in the solution and their respective functions, as well as an architecture diagram.
In part two of this blog series, I detail the steps needed to deploy the solution on AWS and perform testing that demonstrates how to fulfill DICOM and Dicoogle functionalities.