Opinions are mine

Scale

I’ve been using Consul since one of the first releases back in 2014 and have set it up multiple times in various environments - from standalone nodes for convenience purposes to multi-datacenter federated clusters partitioned over WAN and LAN segments.

Over time, dealing with larger and larger setups, I learned some tricks that came handy as scale grew, especially once node and service counters start running in hundreds or thousands.

I must note that I have not pushed Consul to its limits and my experience is not outstanding in any way as there seem to exist setups that are orders of magnitude larger than what I dealt with so far:

On the other hand, I rarely see people talking about tweaks I’ve been using, and I definitely would’ve saved myself from many problems if I knew about them before I needed them.

Scope

I only deal with Linux and assume there are no resource bottlenecks - as in, Consul is not starving on CPU/IO/RAM/etc.

Stability

Few things make Consul servers way more stable and save time investigating sporadic API call failures. In other words, let’s start by making sure that as few requests would fail as possible.

Open File Descriptors

Opening a file takes a file descriptor. As well as opening a socket - and not only for listening for incoming connections but also for each connection that is kept open. It is especially crucial for Consul as it usually talks a lot via network - both to keep in touch with other nodes and to process API requests to update/retrieve service or KV records.

Linux enforces limits on how many file descriptors given process can have. There is a system-wide limit that one can set via sysctl and there is a limit for any given process.

The latter can be easily adjusted. In case of Systemd it would be just an entry under Service section:

[Service]
LimitNOFILE=16384

and with Docker it’s just an extra command line parameter:

$ docker run --ulimit nofile=16384 consul

raft_multiplier

Of all items on the list, this one is closest to “tweaking” Consul itself. It’s very well documented in Consul’s performance guide, but the bottom line is to put this into Consul Servers' configuration:

{
  "performance": {
    "raft_multiplier": 1
  }
}

This, essentially, speeds up leader failure detection and re-election and therefore minimizes the window when requests involving leader may fail.

Stale Reads

This topic is also briefly mentioned in Consul’s performance guide, but I think this is way under-appreciated optimization.

In my experience, even in a federated setup with hundreds of nodes and services, may a KV entry or a service state change - this information would be propagated around very quickly. In a matter of 10th of milliseconds, when using stale reads. Such latencies are detectable only when one uses long-polling to listen for state changes. On top of that, I am not sure I ever saw health-checks being executed more often than once per second or few. It means that the propagation latency can be considered negligible.

The official documentation says (highlights are mine):

This mode allows any server to service the read regardless of whether it is the leader. It means reads can be **arbitrarily** stale; however, results are generally consistent to within **50 milliseconds** of the leader. The trade-off is very fast and scalable reads with a higher likelihood of stale values. Since this mode **allows reads without a leader, a cluster that is unavailable would still be able to respond to queries**.

What does that mean? If we recall the CAP theorem - it means that we opt into AP, or in other words, being available in the face of a network partition. It might not be suitable for all cases, but in my experience, that’s a safe assumption in 99% of them - to the degree that I always default to stale reads.

Also, since reads are “AP” now - even leader failure would not impact data retrieval.

Intervals

When dealing with large distributed systems relying on time too much is not the best idea as high precision clocks are hard, they would drift away or won’t be appropriately synchronized at all.

Even though Consul’s performance is such that durations are usually measured in milliseconds, that’s not a sane precision in general purpose distributed system built on commodity hardware.

Going an order of magnitude slower or two - at a per second precision - would simplify many things and would generally eliminate the need to worry about stale data, rate limits, abusive API clients and so on. Even though one can use long-polling to retrieve updates from Consul immediately, adding a second-scale interval before re-starting a watch would significantly reduce the load on both on Consul and on the software that processes updates. Going in batches is always more efficient and even may eliminate the need for processing updates if some of the services are flapping within the interval period (unless that’s the goal).

Reliability

Approaching Consul from user’s perspective “reliability” means not Consul’s reliability but rather making sure that operations are executed are performed they way one expects, and there are as few unpredicted edge cases as possible.

Durable Writes

Speaking about stale consistency mode above, I only mentioned reads and not writes. To the best of my knowledge, writes are not subject to consistency modes, and this is also briefly mentioned in Consul’s documentation:

consistency query parameters will be ignored, since writes are always managed by the leader via the Raft consensus protocol

In other words, writes are durable by default, and one does not have to worry about consistency modes when modifying data.

Federated Atomic Reads

For a pretty long time by now, Consul has Transactions API that can be used to perform atomic sets of operations. Unfortunately, I did not have a chance to measure its overhead yet, but it’s a way to go if one needs to combine several operations of different types (read a key, modify another one or maybe delete it).

Sometimes though one needs to fetch data from another data center in an “atomic way” - as in, imagine watching few KV sub-trees in a federated cluster. It may lead to unexpected results in the face of network issues, especially when they are intermittent - some of the watches can succeed while others may fail and fall into a retry loop.

A “trick” around such a problem is to avoid watching several KV subtrees in a single context when talking to a federated cluster. Instead, one can watch a superset of all trees that should be monitored. While potentially resulting in more data transfer, this would make the whole process smoother as it’s always would be easy to reason about the state of such a KV observation.

Performance

When talking to Consul’s APIs, one usually takes its performance for granted, but how exactly one talks to APIs can make a huge difference.

ACLs for performance

Consul’s KV operations can be performed either on individual entries or entire sub-trees. Catalog operations are usually performed on particular service or node entries. Nevertheless, when dealing with large clusters, one may face situations when it’s impossible to filter out unnecessary data.

Watching a service state on a few particular nodes would result in either several watches with complicated synchronization or watching service state across the whole cluster and then extracting a tiny piece of the retrieved data.

Same goes for KV - as in the example with federated atomic reads, one may need to watch the entire KV tree to be able to detect changes in a few key entries under different paths.

It might be an un-obvious solution, but ACLs can be a great help with that. In most cases, it’s pretty easy to create a Consul access token with ACL rules that would restrict available data to only those that are needed. It may not only tremendously reduce data transfer but also reduce the load by skipping updates to irrelevant data and even simplify the data filtering on its own, given that the ACL rules engine is domain specific.

HTTP/2

Since version 1.0.1 Consul API supports HTTP/2 when TLS is enabled, which means that one can utilize persistent connections and multiplex many API calls over a single TCP connection. It reduces overhead on establishing new connections and also reduces the need for file descriptors on Consul agents that serve many API calls.

Conclusion

While Consul’s documentation has many excellent guides on optimizing Consul’s performance, stability and reliability, there is a lot that can be done from a user’s perspective. Not only such “tricks” can make Consul’s life easier but also can simplify applications that are talking to Consul APIs.