18 Apr 2019

Another look at Replica Lag :)

The other day, I remembered an old 9.0-era mail thread (when Streaming Replication had just launched) where someone had tried to daisy-chain Postgres Replicas and see how many (s)he could muster.

If I recall correctly, the OP could squeeze only ~120 or so, mostly because the Laptop memory gave way (and not really because of an engine limitation).

I couldn't find that post, but it was intriguing to know if we could reach (at least) a thousand mark and see what kind of "Replica Lag" would that entail; thus NReplicas.

On a (very) unscientific test, my 4-Core 16G machine can spin-up (create data folders and host processes for all) 1000 Replicas in ~8m (and tear them down in another ~2m). Now am sure this could get better, but amn't complaining since this was a breeze to setup (in that it just worked without much tinkering ... besides lowering shared_buffers).

For those interested, a single UPDATE on the master, could (nearly consistently) be seen on the last Replica in less than half a second, with top showing 65% CPU idle (and 2.5 on the 1-min CPU metric) during a ~30 minute test.

Put in simple terms, what this means is that the UPDATE change traveled from the Master to a Replica (lets call it Replica1) and then from Replica1 it cascaded the change on to Replica2 (and so on a 1000 times). The said row change can be seen on Replica1000 within half a second.

So although (I hope) this isn't a real-world use-case, I still am impressed that this is right out-of-the-box and still way under the 1 second mark.... certainly worthy of a small post :) !

No comments: