That Guy From Delhi: November 2017

20 Nov 2017

Update: RDS Prewarm script updated to fetch FSM / VM chunks

(This post is in continuation to my previous post regarding Initializing RDS Postgres Instance)

This simple SQL "Initializes" the EBS volume linked to an RDS Instance, something which isn't possible to do without sending workload (and experience high Latency in the first run).

Key scenarios, where this is really helpful are:

Create a Read-Replica (or Hot Standby in Postgres terms)
Restore a new RDS Instance from a Snapshot

Update: The Script, now also does the following:

Now also fetches disk blocks related to FSM / VM of all tables
Now fetches all Indexes

Limitations that still exist:

~~TOAST tables are still directly inaccessible in RDS~~

~~Indexes for TOAST columns also fall under this category~~
~~Trying hard to see if this last hurdle can be worked around~~

~~Anyone with any ideas?!~~

Script needs to be run once per Database Owner

Not sure if there is any magic around this

Object ownership is a Postgres property

RDS Postgres does not give Superuser access

I'll try to ease this in the future

By creating a script to list the Users that this needs to run as
The other possibility is to use DBLink to run this for separate Users in a single run

I'll update here, in case I make any significant changes.

Sample Run

-[ RECORD 1 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:40:08.291891-05

table_size | 13 GB

freespace_map_size | 3240 kB

visibility_map_size | 408 kB

blocks_prefetched | 1639801

current_database | pgbench

schema_name | public

table_name | pgbench_accounts

-[ RECORD 2 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:43:37.703711-05

table_size | 2142 MB

freespace_map_size | 0 bytes

visibility_map_size | 0 bytes

blocks_prefetched | 274194

current_database | pgbench

schema_name | public

table_name | pgbench_accounts_pkey

-[ RECORD 3 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.899115-05

table_size | 440 kB

freespace_map_size | 24 kB

visibility_map_size | 8192 bytes

blocks_prefetched | 59

current_database | pgbench

schema_name | public

table_name | pgbench_tellers

-[ RECORD 4 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.901088-05

table_size | 240 kB

freespace_map_size | 0 bytes

visibility_map_size | 0 bytes

blocks_prefetched | 30

current_database | pgbench

schema_name | public

table_name | pgbench_tellers_pkey

-[ RECORD 5 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.905107-05

table_size | 40 kB

freespace_map_size | 0 bytes

visibility_map_size | 0 bytes

blocks_prefetched | 5

current_database | pgbench

schema_name | public

table_name | pgbench_branches_pkey

-[ RECORD 6 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.907089-05

table_size | 40 kB

freespace_map_size | 24 kB

visibility_map_size | 8192 bytes

blocks_prefetched | 9

current_database | pgbench

schema_name | public

table_name | pgbench_branches

-[ RECORD 7 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.907142-05

table_size | 0 bytes

freespace_map_size | 0 bytes

visibility_map_size | 0 bytes

blocks_prefetched | 0

current_database | pgbench

schema_name | public

table_name | pgbench_history

7 Nov 2017

Prewarming / Initializing an RDS Postgres instance (from S3)

UPDATE: Read this for recent updates. Now the SQL successfully fetches *all* disk blocks on most RDS PostgreSQL (read post for the rare exceptions).

As many of you know, that AWS RDS Postgres uses EBS which has an interesting feature called Lazy Loading that allows it to instantiate a disk (the size of which can be mostly anything from 10GB to 6TB) and it comes online within a matter of minutes. Although a fantastic feature, this however, can lead to unexpected outcomes when high-end production load is thrown at a newly launched RDS Postgres instance immediately after Restoring from a Snapshot.

One possible solution is to use the pg_prewarm Postgres Extension that is well supported in RDS Postgres, immediately after Restoring from a Snapshot, thereby reducing the side-effects of Lazy Loading.

Although pg_prewarm was originally meant for populating buffer-cache, this extension (in this specific use-case) is heaven-sent to initialize (fetch), (almost) the entire snapshot from S3 on to the RDS EBS volume in question. Therefore, even if you use pg_prewarm to run through all tables etc., thereby effectively evicting the recent run for the previous table from buffer-cache, it still does the job of initializing all disk-blocks with respect to the EBS volume.

I've just checked in the SQL to this repository that seems to do this magic pretty well. It also enlists why this would only take you ~70% of the way owing to restrictions / limitations (as per my current understanding).

In the Sample below, I restored a new RDS Postgres instance from a Snapshot and immediately thereafter ran this SQL on it.

Notice that the first table (pgbench_accounts) takes about 22 seconds to load the first time, and less than a second to load the second time.
Similarly the second table (pgbench_history) takes 15 seconds to load the first time and less than a second, the second time :) !

pgbench=> SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench-> relkind, c.relname
pgbench-> FROM pg_class c
pgbench-> JOIN pg_namespace n
pgbench-> ON n.oid = c.relnamespace
pgbench-> JOIN pg_user u
pgbench-> ON u.usesysid = c.relowner
pgbench-> WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench-> ORDER BY c.relpages DESC;
clock_timestamp | pg_prewarm | relkind | relname
-------------------------------+------------+---------+-----------------------
2017-11-07 11:41:44.341724+00 | 17903 | r | pgbench_accounts
2017-11-07 11:42:06.059177+00 | 6518 | r | pgbench_history
2017-11-07 11:42:17.126768+00 | 2745 | i | pgbench_accounts_pkey
2017-11-07 11:42:21.406054+00 | 45 | r | pgbench_tellers
2017-11-07 11:42:21.645859+00 | 24 | r | pgbench_branches
2017-11-07 11:42:21.757086+00 | 2 | i | pgbench_branches_pkey
2017-11-07 11:42:21.757653+00 | 2 | i | pgbench_tellers_pkey
(7 rows)

pgbench=>
pgbench=> SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench-> relkind, c.relname
pgbench-> FROM pg_class c
pgbench-> JOIN pg_namespace n
pgbench-> ON n.oid = c.relnamespace
pgbench-> JOIN pg_user u
pgbench-> ON u.usesysid = c.relowner
pgbench-> WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench-> ORDER BY c.relpages DESC;
clock_timestamp | pg_prewarm | relkind | relname
-------------------------------+------------+---------+-----------------------
2017-11-07 11:42:33.914195+00 | 17903 | r | pgbench_accounts
2017-11-07 11:42:33.917725+00 | 6518 | r | pgbench_history
2017-11-07 11:42:33.918919+00 | 2745 | i | pgbench_accounts_pkey
2017-11-07 11:42:33.919412+00 | 45 | r | pgbench_tellers
2017-11-07 11:42:33.919427+00 | 24 | r | pgbench_branches
2017-11-07 11:42:33.919438+00 | 2 | i | pgbench_branches_pkey
2017-11-07 11:42:33.919443+00 | 2 | i | pgbench_tellers_pkey
(7 rows)

20 Nov 2017

Update: RDS Prewarm script updated to fetch FSM / VM chunks

Sample Run

7 Nov 2017

Prewarming / Initializing an RDS Postgres instance (from S3)

Taming ReorderBufferWrite - Boost Logical Decoding in Postgres