Dgraph on AWS: Setting up a horizontally scalable graph database

This article is a guest post from Joaquin Menchaca, an SRE at Dgraph.

Dgraph is an open source, distributed graph database, built for production environments, and written entirely in Go. Dgraph is fast, transactional, sharded, and distributed (joins, filters, sorts), consistently replicated with Raft, and provides fault tolerance with synchronous replication and horizontal scalability.

The language used to interact with Dgraph is GraphQL and our variant called GraphQL+-. This gives apps access to the benefits of GraphQL directly from the database.

Dgraph has client integrations with official clients in Go, Java, Python, JavaScript, and C#; and community-supported clients with Dart, Rust, and Elixir. Dgraph users also can use any of the tools and libraries that work with GraphQL.

To get started right away, download Dgraph and follow the quick-start guide.

Getting started with Dgraph locally on your own computer, where you can quickly model your data in Dgraph and build your app, is easy. When you’re ready to deploy this to a production environment, you’ll want to deploy Dgraph to the cloud. You can horizontally scale Dgraph across multiple machines for high availability and data sharding.

In this article, we’ll show how to set up a resilient highly available Dgraph cluster on AWS.

Dgraph architecture

Dgraph is deployed as a single binary with no external dependencies. You can use the same binary to run all of Dgraph. The official Dgraph Docker image simplifies deploying to Kubernetes, where you can set up a multi-node highly available Dgraph cluster.

Two kinds of processes are in a Dgraph cluster: Zeros (cluster managers) and Alphas (data servers). Zeros control the Dgraph cluster, store the group membership of Alphas, and manage transactions cluster-wide. Alphas store data and indexes and serve all client requests.

We will need at least one Zero and one Alpha to run Dgraph. There can be multiple Zeros and Alphas running in groups as a cluster. There is always a single group of Zeros and one or more groups of Alphas. Each group is a Raft consensus group for high availability and consistent replication.

Raft consensus groups in Dgraph showing Zero Group and Alpha Group

Dgraph is resilient to any one of these instances failing. The cluster remains available for users to read and write their data. Distributed systems can fail for a myriad of reasons, and these failures shouldn’t make our backend go awry. As long as the majority of a group remains up, then requests can proceed. More specifically, if the number of replicas is 2N + 1, up to N servers can be down without any impact on reads or writes. Groups are odd-numbered for a quorum.

Kubernetes: An ideal Dgraph companion

Although Dgraph can run on a cluster of nodes for high availability, an orchestration tool is necessary for health checking, self-healing, storage volume management, and setting up the network. The Kubernetes platform provides all of this, making it the ideal platform to host Dgraph.

Kubernetes maintains Dgraph’s resiliency as it constantly monitors Dgraph instances for readiness and liveness, and it can automatically move instances off of unhealthy worker nodes and run them on healthy ones. Kubernetes manages stateful apps like Dgraph with StatefulSets. Every Dgraph process is deployed as a Pod. In StatefulSets, Kubernetes gives each pod the same hostname identity and persistent volumes even when it moves to different worker nodes. If a worker node goes bad or if a Dgraph process gets restarted, then the pod will restart on a healthy worker node and re-sync the latest changes from replicated peers. Even when a pod restarts, Dgraph is available to process all requests on the rest of the healthy nodes.

The illustration below shows what the Dgraph Stateful Set (sts) as it relates to other components. This includes three pods and corresponding allocation of storage, called pvc (persistent volume claim) that comes from a persistent volume (pv). Alongside this, we deploy a service (svc) resource that will route traffic to one of three highly available pods.

Kubernetes components deployed

Creating a Kubernetes cluster on AWS

On AWS, Amazon Elastic Kubernetes Service (Amazon EKS) is the recommended way to run Kubernetes. Amazon EKS manages the master control plane and worker nodes and offers integration with AWS resources in security, networking, storage, monitoring, scaling, and more.

Although Kubernetes provides high availability through scaling and recovery of workloads called pods, Amazon EKS node groups also provide scaling and recovery to the worker nodes themselves. The illustration below is an example Amazon EKS cluster that has six worker nodes available to host Dgraph Alphas and Zeros, as well multiple masters managed by Amazon EKS.

Kubernetes with Amazon EKS

The simplest way to get started immediately with building an Amazon EKS cluster is to use eksctl, a command-line tool to deploy Amazon EKS using AWS CloudFormation stacks. This tool automates a lot of complexity involved with provisioning Amazon EKS. In addition to installing eksctl, we also need to install kubectl to interact with Kubernetes once it’s up.

Provision EKS cluster

We have an example eksctl configuration that can be used for this exercise; download the example from GitHub.

We can create the cluster with the command:

CONFIG="https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-ha/cluster.yaml"

curl --silent $CONFIG | eksctl create cluster --config-file -

This configuration creates a three-node Amazon EKS cluster in the us-east-2 region. The process takes about 20 minutes to fully provision the Kubernetes infrastructure. Once the process completes, we can test our cluster with the following command:

kubectl get all --all-namespaces

With Kubernetes up and running, we can now deploy Dgraph.

Creating a high availability Dgraph cluster on Kubernetes

Now that we have a Kubenetes cluster with Amazon EKS, we can deploy a highly available Dgraph cluster on Kubernetes.

We have an example manifest, dgraph-ha.yaml, to get us up and running with a high availability Dgraph cluster. We can run it in Kubernetes by running the following command:

MANIFEST="https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml"

kubectl apply --filename $MANIFEST

This Kubernetes manifest will create a Dgraph cluster with three Alpha pods and three Zero pods, as well as a pod for the web client Ratel. The Alpha and Zero pods are run via StatefulSets with a persistent Amazon Elastic Block Store (Amazon EBS) volume to store the data, so they can keep the same data around even if they’re restarted or rescheduled to different machines. The web client Ratel is stateless, so it is a Deployment with no extra persistent disks needed.

The manifest does not expose any of these services to the public internet for security reasons. If we would like to add endpoints, we can modify the manifest to add these changes, such as changing the service type to LoadBalancer, or adding an Ingress resource.

The following Kubernetes diagram is similar to diagrams used in the Kubernetes basic tutorial and other Kubernetes documentation. This is an overview of the components involved when deploying Dgraph on Kubernetes. The Zero and Alpha pods deployed by a StatefulSet controller, as mentioned previously, will be distributed across the cluster and have an attached disk, indicated by the purple disk icon. The Ratel pod deployed by a Deployment controller will only have a single pod in one of the tree worker nodes and does not have an attached persistent disk.

Kubernetes Cluster

Accessing the cluster (revised)

We can view services that are running with the following command:

kubectl get services

This will show something similar to the following:

results showing services running

We will need to access dgraph-alpha-public and dgraph-ratel-public. With dgraph-alpha-public, we have HTTP access on port 8080 and gRPC access port 9080, and with the dgraph-ratel-public web UI, we have HTTP access on port 8000.

We can make these accessible locally via localhost using the kubectl port-forward command. For this section, we will open up multiple terminal windows or tabs.

Port forward locally

In a new terminal, run the following command to access an Alpha pod through HTTP:

kubectl port-forward service/dgraph-alpha-public 8080:8080

In a new terminal, run the following command to access a Ratel pod through HTTP:

kubectl port-forward service/dgraph-ratel-public 8000:8000

Interacting with Ratel

We can access Ratel by typing http://localhost:8000/ in a browser. We will be presented with three options; select Latest.

For the Dgraph Server Connection, select http://localhost:8080, Connect, Continue. Now we should have a window that shows something like:

For the Dgraph Server Connection, pick http://localhost:8080, hit the Connect button and then Continue button. You should by now have a window that looks something like this:

Mutation using Ratel

If we already have mutations and queries in mind, we can run these now. This example is similar to our Getting Started Step 2: Run Mutation documentation.

Select the Console and Mutate radio button, copy and paste the text below, and then select the Run button.

{
  "set": [
    {"uid": "_:luke","name": "Luke Skywalker", "dgraph.type": "Person"},
    {"uid": "_:leia","name": "Princess Leia", "dgraph.type": "Person"},
    {"uid": "_:han","name": "Han Solo", "dgraph.type": "Person"},
    {"uid": "_:lucas","name": "George Lucas", "dgraph.type": "Person"},
    {"uid": "_:irvin","name": "Irvin Kernshner", "dgraph.type": "Person"},
    {"uid": "_:richard","name": "Richard Marquand", "dgraph.type": "Person"},
    {
      "uid": "_:sw1",
      "name": "Star Wars: Episode IV - A New Hope",
      "release_date": "1977-05-25",
      "revenue": 775000000,
      "running_time": 121,
      "starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
      "director": [{"uid": "_:lucas"}],
      "dgraph.type": "Film"
    },
    {
      "uid": "_:sw2",
      "name": "Star Wars: Episode V - The Empire Strikes Back",
      "release_date": "1980-05-21",
      "revenue": 534000000,
      "running_time": 124,
      "starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
      "director": [{"uid": "_:irvin"}],
      "dgraph.type": "Film"
    },
    {
      "uid": "_:sw3",
      "name": "Star Wars: Episode VI - Return of the Jedi",
      "release_date": "1983-05-25",
      "revenue": 572000000,
      "running_time": 131,
      "starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
      "director": [{"uid": "_:richard"}],
      "dgraph.type": "Film"
    },
    {
      "uid": "_:st1",
      "name": "Star Trek: The Motion Picture",
      "release_date": "1979-12-07",
      "revenue": 139000000,
      "running_time": 132,
      "dgraph.type": "Film"
    }
  ]
}

Schema using Ratel

Then we can alter the schema by selecting the Schema button on the left, and then the Bulk Edit button. In this edit box, copy and paste the text below, and then select Apply Schema.

name: string @index(term) .
 release_date: datetime @index(year) .
 revenue: float .
 running_time: int .
 
 type Person {
   name
 }
 
 type Film {
   name
   release_date
   revenue
   running_time
   starring
   director
 }

The interface should look something like:

Interface showing Edit Schema File with option to apply schema

Query using Ratel

Now we can do a query. Select Console, Query, then copy and paste the following:

{
 me(func:allofterms(name, "Star Wars")) @filter(ge(release_date, "1980")) {
   name
   release_date
   revenue
   running_time
   director {
    name
   }
   starring {
    name
   }
 }
}

After the results return, we are shown the JSON and the graphical representation of the graph. This should look something like the following:

Once the results return, we can see the JSON as well as look at the graphical representation of the graph.

We also can view the equivalent result in JSON:

Once the results return, we are shown the JSON and the graphical representation of the graph.

And that’s it. We have run our first mutation and ran a query to find the relationships between the movies actors played in and the directors of those movies.

Cleaning up

Once finished with the Dgraph cluster, we can remove it from Kubernetes with the following command:

MANIFEST="https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml"
 
# delete workloads (statefulsets, deployments, pods) and services
kubectl delete --filename $MANIFEST
# delete storage (persistent volume claims and associated volumes)
kubectl delete pvc --selector app=dgraph-alpha
kubectl delete pvc --selector app=dgraph-zero

If we are finished with the Amazon EKS cluster that we deployed with eksctl, we can remove this with:

eksctl delete cluster --name dgraph-ha-cluster --region us-east-2

Conclusion

Refer to our documentation for additional information on this deployment topic, and other Dgraph topics and links to tutorials. You can join the Dgraph community, where you can post questions and share your experiences with Dgraph.

Joaquin Menchaca

Joaquin is an SRE at Dgraph Labs. He started early in QA and transitioned to Ops. He is passionate about DevOps philosophy and advancements in cloud native infrastructure, and he brings years of experience in change configuration and container orchestration. In his spare time, Joaquin enjoys conversing in different languages, having studied 12, including French, Spanish, Japanese, and Korean.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

AWS Open Source Blog