Ask a Question

Production Checklist

This guide describes important setup recommendations for a production-ready Dgraph cluster.

Note In this guide, a node refers to a Dgraph instance unless specified otherwise.

A Dgraph cluster is comprised of multiple Dgraph instances (aka nodes) connected together to form a single distributed database. A Dgraph instance is either a Dgraph Zero or Dgraph Alpha, each of which serves a different role in the cluster.

There can be one or more Dgraph clients connected to Dgraph to perform database operations (queries, mutations, alter schema, etc.). These clients connect via gRPC or HTTP. Dgraph provides official clients for Go, Java, Python, and JavaScript, and C#. All of these are gRPC-based, and JavaScript supports both gRPC and HTTP for browser support. Community-developed Dgraph clients for other languages are also available. The full list of clients can be found in Clients page. You can also interact with Dgraph via curl over HTTP. Dgraph Ratel is a UI client used to visualize queries, run mutations, and manage the schema. Clients do not participate as a member of the database cluster, they simply connect to one or more Dgraph Alpha instances.

Cluster Requirements

A minimum of one Dgraph Zero and one Dgraph Alpha is needed for a working cluster.

There can be multiple Dgraph Zeros and Dgraph Alphas running in a single cluster.

Machine Requirements

To ensure predictable performance characteristics, Dgraph instances should not run on “burstable” or throttled machines that limit resources. That includes t2 class machines on AWS.

To ensure that Dgraph can take full advantage of machine resources, we recommend each Dgraph instance to be deployed to a single dedicated machine to ensure that Dgraph can take full advantage of machine resources. That is, for a 6-node Dgraph cluster with 3 Dgraph Zeros and 3 Dgraph Alphas, each process runs in its own machine (e.g., EC2 instance). In the event of a machine failure, only one instance is affected, instead of multiple if they were running on that same machine.

If you’d like to run Dgraph with fewer machines, then the recommended configuration is to run a single Dgraph Zero and a single Dgraph Alpha per machine. In a high availability setup, that allows the cluster to lose a single machine (simultaneously losing a Dgraph Zero and a Dgraph Alpha) with continued availability of the database.

Do not run multiple Dgraph Zeros or Dgraph Alpha processes on a single machine. This can affect performance due to shared resource issues and reduce availability in the event of machine failures.

Operating System

Dgraph is designed to run on Linux. As of release v21.03, Dgraph no longer supports installation on Windows or macOS. To run Dgraph on Windows and macOS, use the standalone Docker image.

CPU and Memory

At a the bare minimum, we recommend machines with at least 8 CPUs and 16 GiB of memory for testing.

You’ll want a ensure that your CPU and memory resources are sufficient for your production workload. A common configuration for Dgraph is 16 CPUs and 32 GiB of memory per machine. Dgraph is designed with concurrency in mind, so more cores means quicker processing and higher throughput of requests.

You may find you’ll need more CPU cores and memory for your specific use case.

Disk

Dgraph instances make heavy use of disks, so storage with high IOPS is highly recommended to ensure reliable performance. Specifically, we recommend SSDs, not HDDs.

Regarding disk IOPS, we recommend:

  • 1000 IOPS minimum
  • 3000 IOPS for medium and large datasets

Instances such as c5d.4xlarge have locally-attached NVMe SSDs with high IOPS. You can also use EBS volumes with provisioned IOPS (io1). If you are not running performance-critical workloads, you can also choose to use cheaper gp2 EBS volumes. AWS introduced the new gp3 disks that gives 3000 IOPS at any disk size.

Recommended disk sizes for Dgraph Zero and Dgraph Alpha:

  • Dgraph Zero: 200 GB to 300 GB. Dgraph Zero stores cluster metadata information and maintains a write-ahead log for cluster operations.
  • Dgraph Alpha: 250 GB to 750 GB. Dgraph Alpha stores database data, including the schema, indices, and the data values. It maintains a write-ahead log of changes to the database. Your cloud provider may provide better disk performance based on the volume size.

Additional recommendations:

  • The recommended Linux filesystem is ext4.
  • Avoid using shared storage such as NFS, CIFS, and CEPH storage.

Firewall Rules

Dgraph instances communicate over several ports. Firewall rules should be configured appropriately for the ports documented in Ports Usage.

Internal ports must be accessible by all Zero and Alpha peers for proper cluster-internal communication. Database clients must be able to connect to Dgraph Alpha external ports either directly or through a load balancer.

Dgraph Zeros can be set up in a private network where communication is only with Dgraph Alphas, database administrators, internal services (such as Prometheus or Jaeger), and possibly developers (see note below). Dgraph Zero’s 6080 external port is only necessary for database administrators. For example, it can be used to inspect the cluster metadata (/state), allocate UIDs or set txn timestamps (/assign), move data shards (/moveTablet), or remove cluster nodes (/removeNode). The full docs about Zero’s administrative tasks are in More About Dgraph Zero.

Note Developers using Dgraph Live Loader or Dgraph Bulk Loader require access to both Dgraph Zero port 5080 and Dgraph Alpha port 9080. When using those tools, consider using them within your environment that has network access to both ports of the cluster.

Operating System Tuning

The OS should be configured with the recommended settings to ensure that Dgraph runs properly.

File Descriptors Limit

Dgraph can use a large number of open file descriptors. Most operating systems set a default limit that is lower than what is required.

It is recommended to set the file descriptors limit to unlimited. If that is not possible, set it to at least a million (1,048,576) which is recommended to account for cluster growth over time.

Deployment

A Dgraph instance is run as a single process from a single static binary. It does not require any additional dependencies or separate services in order to run (see the Supplementary Services section for third-party services that work alongside Dgraph). A Dgraph cluster is set up by running multiple Dgraph processes networked together.

Backup Policy

A backup policy is a predefined, set schedule used to schedule backups of information from business applications. A backup policy helps to ensure data recoverability in the event of accidental data deletion, data corruption, or a system outage.

For Dgraph, backups are created using the backups enterprise feature. You can also create full exports of your data and schema using data exports available as an open source feature.

We strongly recommend that you have a backup policy in place before moving your application to the production phase, and we also suggest that you have a backup policy even for pre-production apps supported by Dgraph database instances running in development, staging, QA or pre-production clusters.

We suggest that your policy include frequent full and incremental backups. Accordingly, we suggest the following backup policy for your production apps:

  • full backup every 24hrs
  • incremental backup every 2/4hrs

Terminology

An N-node cluster is a Dgraph cluster that contains N number of Dgraph instances. For example, a 6-node cluster means six Dgraph instances. The replication setting specifies the number of Dgraph Alpha replicas that are assigned per group. The replication setting is a configuration flag (--replicas) on Dgraph Zero. A Dgraph Alpha group is a set of Dgraph Alphas that store replications of the data among each other. Every Dgraph Alpha group is assigned a set of distinct predicates to store and serve.

Examples of different cluster settings:

  • No sharding
    • 2-node cluster: 1 Zero, 1 Alpha (one group).
    • HA equivalent: x3 = 6-node cluster.
  • With 2-way sharding:
    • 3-node cluster: 1 Zero, 2 Alphas (two groups).
    • HA equivalent: x3 = 9-node cluster.

In the following examples we outline the two most common cluster configurations: a 2-node cluster and a 6-node cluster.

Basic setup: 2-node cluster

We provide sample configs for both Docker Compose and Kubernetes for a 2-node cluster. You can also run Dgraph directly on your host machines.

2-node cluster

Configuration can be set either as command-line flags, environment variables, or in a config file (see Config).

Dgraph Zero:

  • The --my flag should be set to the address:port (the internal-gRPC port) that will be accessible to the Dgraph Alpha (default: localhost:5080).
  • The --raft superflag’s idx option should be set to a unique Raft ID within the Dgraph Zero group (default: 1).
  • The --wal flag should be set to the directory path to store write-ahead-log entries on disk (default: zw).
  • The --bindall flag should be set to true for machine-to-machine communication (default: true).
  • Recommended: For better issue diagnostics, set the log level verbosity to 2 with the option --v=2.

Dgraph Alpha:

  • The --my flag should be set to the address:port (the internal-gRPC port) that will be accessible to the Dgraph Zero (default: localhost:7080).
  • The --zero flag should be set to the corresponding Zero address set for Dgraph Zero’s --my flag.
  • The --postings flag should be set to the directory path for data storage (default: p).
  • The --wal flag should be set to the directory path for write-ahead-log entries (default: w)
  • The --bindall flag should be set to true for machine-to-machine communication (default: true).
  • Recommended: For better issue diagnostics, set the log level verbosity to 2 --v=2.

HA setup: 6-node cluster

We provide sample configs for both Docker Compose and Kubernetes for a 6-node cluster with 3 Alpha replicas per group. You can also run Dgraph directly on your host machines.

A Dgraph cluster can be configured in a high-availability setup with Dgraph Zero and Dgraph Alpha each set up with peers. These peers are part of Raft consensus groups, which elect a single leader among themselves. The non-leader peers are called followers. In the event that the peers cannot communicate with the leader (e.g., a network partition or a machine shuts down), the group automatically elects a new leader to continue.

Configuration can be set either as command-line flags, environment variables, or in a config file (see Config).

In this setup, we assume the following hostnames are set:

  • zero1
  • zero2
  • zero3
  • alpha1
  • alpha2
  • alpha3

We will configure the cluster with 3 Alpha replicas per group. The cluster group-membership topology will look like the following:

Dgraph cluster image

Set up Dgraph Zero group

In the Dgraph Zero group you must set unique Raft IDs (--raft superflag’s idx option) per Dgraph Zero. Dgraph will not auto-assign Raft IDs to Dgraph Zero instances.

The first Dgraph Zero that starts will initiate the database cluster. Any following Dgraph Zero instances must connect to the cluster via the --peer flag to join. If the --peer flag is omitted from the peers, then the Dgraph Zero will create its own independent Dgraph cluster.

First Dgraph Zero example: dgraph zero --replicas=3 --raft idx=1 --my=zero1:5080

The --my flag must be set to the address:port of this instance that peers will connect to. The --raft superflag’s idx option sets its Raft ID to 1.

Second Dgraph Zero example: dgraph zero --replicas=3 --raft idx=2 --my=zero2:5080 --peer=zero1:5080

The --my flag must be set to the address:port of this instance that peers will connect to. The --raft superflag’s idx option sets its Raft ID to 2, and the --peer flag specifies a request to connect to the Dgraph cluster of zero1 instead of initializing a new one.

Third Dgraph Zero example: dgraph zero --replicas=3 --raft idx=3 --my=zero3:5080 --peer=zero1:5080:

The --my flag must be set to the address:port of this instance that peers will connect to. The --raft superflag’s idx option sets its Raft ID to 3, and the --peer flag specifies a request to connect to the Dgraph cluster of zero1 instead of initializing a new one.

Dgraph Zero configuration options:

  • The --my flag should be set to the address:port (the internal-gRPC port) that will be accessible to Dgraph Alpha (default: localhost:5080).
  • The --raft superflag’s idx option should be set to a unique Raft ID within the Dgraph Zero group (default: 1).
  • The --wal flag should be set to the directory path to store write-ahead-log entries on disk (default: zw).
  • The --bindall flag should be set to true for machine-to-machine communication (default: true).
  • Recommended: For more informative log info, set the log level verbosity to 2 with the option --v=2.

Set up Dgraph Alpha group

The number of replica members per Alpha group depends on the setting of Dgraph Zero’s --replicas flag. Above, it is set to 3. So when Dgraph Alphas join the cluster, Dgraph Zero will assign it to an Alpha group to fill in its members up to the limit per group set by the --replicas flag.

First Alpha example: dgraph alpha --my=alpha1:7080 --zero=zero1:5080

Second Alpha example: dgraph alpha --my=alpha2:7080 --zero=zero1:5080

Third Alpha example: dgraph alpha --my=alpha3:7080 --zero=zero1:5080

Dgraph Alpha configuration options:

  • The --my flag should be set to the address:port (the internal-gRPC port) that will be accessible to the Dgraph Zero (default: localhost:7080).
  • The --zero flag should be set to the corresponding Zero address set for Dgraph Zero’s --myflag.
  • The --postings flag should be set to the directory path for data storage (default: p).
  • The --wal flag should be set to the directory path for write-ahead-log entries (default: w)
  • The --bindall flag should be set to true for machine-to-machine communication (default: true).
  • Recommended: For more informative log info, set the log level verbosity to 2 --v=2.

Supplementary Services

These services are not required for a Dgraph cluster to function but are recommended for better insight when operating a Dgraph cluster.