20 Nov 2017

Update: RDS Prewarm script updated to fetch FSM / VM chunks

(This post is in continuation to my previous post regarding Initializing RDS Postgres Instance)

This simple SQL "Initializes" the EBS volume linked to an RDS Instance, something which isn't possible to do without sending workload (and experience high Latency in the first run).

Key scenarios, where this is really helpful are:

  • Create a Read-Replica (or Hot Standby in Postgres terms)
  • Restore a new RDS Instance from a Snapshot


Update: The Script, now also does the following:

  • Fetch disk blocks related to FSM / VM of all tables
  • Fetch (most) Indexes
    • Fetch all non-TEXT indexes
    • Among TEXT indexes, fetch all non-TOAST indexes

Limitations that still exist:
  • TOAST tables are still directly inaccessible in RDS
    • Indexes for TOAST columns also fall under this category
    • Trying hard to see if this last hurdle can be worked around
      • Anyone with any ideas?!
  • Script needs to be run once per Database Owner
    • Not sure if there is any magic around this
      • Object ownership is a Postgres property
        • RDS Postgres does not give Superuser access
    • I'll try to ease this in the future
      • By creating a script to list the Users that this needs to run as
      • The other possibility is to use DBLink to run this for separate Users in a single run

I'll update here, in case I make any significant changes.

Sample Run



-[ RECORD 1 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:40:08.291891-05
table_size          | 13 GB
freespace_map_size  | 3240 kB
visibility_map_size | 408 kB
blocks_prefetched   | 1639801
current_database    | pgbench
schema_name         | public
table_name          | pgbench_accounts
-[ RECORD 2 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:43:37.703711-05
table_size          | 2142 MB
freespace_map_size  | 0 bytes
visibility_map_size | 0 bytes
blocks_prefetched   | 274194
current_database    | pgbench
schema_name         | public
table_name          | pgbench_accounts_pkey
-[ RECORD 3 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:44:12.899115-05
table_size          | 440 kB
freespace_map_size  | 24 kB
visibility_map_size | 8192 bytes
blocks_prefetched   | 59
current_database    | pgbench
schema_name         | public
table_name          | pgbench_tellers
-[ RECORD 4 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:44:12.901088-05
table_size          | 240 kB
freespace_map_size  | 0 bytes
visibility_map_size | 0 bytes
blocks_prefetched   | 30
current_database    | pgbench
schema_name         | public
table_name          | pgbench_tellers_pkey
-[ RECORD 5 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:44:12.905107-05
table_size          | 40 kB
freespace_map_size  | 0 bytes
visibility_map_size | 0 bytes
blocks_prefetched   | 5
current_database    | pgbench
schema_name         | public
table_name          | pgbench_branches_pkey
-[ RECORD 6 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:44:12.907089-05
table_size          | 40 kB
freespace_map_size  | 24 kB
visibility_map_size | 8192 bytes
blocks_prefetched   | 9
current_database    | pgbench
schema_name         | public
table_name          | pgbench_branches
-[ RECORD 7 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:44:12.907142-05
table_size          | 0 bytes
freespace_map_size  | 0 bytes
visibility_map_size | 0 bytes
blocks_prefetched   | 0
current_database    | pgbench
schema_name         | public
table_name          | pgbench_history

7 Nov 2017

Prewarming / Initializing an RDS Postgres instance (from S3)

As many of you know, that AWS RDS Postgres uses EBS which has an interesting feature called Lazy Loading that allows it to instantiate a disk (the size of which can be mostly anything from 10GB to 6TB) and it comes online within a matter of minutes. Although a fantastic feature, this however, can lead to unexpected outcomes when high-end production load is thrown at a newly launched RDS Postgres instance immediately after Restoring from a Snapshot.

One possible solution is to use the pg_prewarm Postgres Extension that is well supported in RDS Postgres, immediately after Restoring from a Snapshot, thereby reducing the side-effects of Lazy Loading.

Although pg_prewarm was originally meant for populating buffer-cache, this extension (in this specific use-case) is heaven-sent to initialize (fetch), (almost) the entire snapshot from S3 on to the RDS EBS volume in question. Therefore, even if you use pg_prewarm to run through all tables etc., thereby effectively evicting the recent run for the previous table from buffer-cache, it still does the job of initializing all disk-blocks with respect to the EBS volume.

I've just checked in the SQL to this repository that seems to do this magic pretty well. It also enlists why this would only take you ~70% of the way owing to restrictions / limitations (as per my current understanding).

In the Sample below, I restored a new RDS Postgres instance from a Snapshot and immediately thereafter ran this SQL on it.


  • Notice that the first table (pgbench_accounts) takes about 22 seconds to load the first time, and less than a second to load the second time.
  • Similarly the second table (pgbench_history) takes 15 seconds to load the first time and less than a second, the second time :) !



pgbench=>       SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench->       relkind, c.relname
pgbench->       FROM pg_class c
pgbench->         JOIN pg_namespace n
pgbench->           ON n.oid = c.relnamespace
pgbench->         JOIN pg_user u
pgbench->           ON u.usesysid = c.relowner
pgbench->       WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench->       ORDER BY c.relpages DESC;
        clock_timestamp        | pg_prewarm | relkind |        relname
-------------------------------+------------+---------+-----------------------
 2017-11-07 11:41:44.341724+00 |      17903 | r       | pgbench_accounts
 2017-11-07 11:42:06.059177+00 |       6518 | r       | pgbench_history
 2017-11-07 11:42:17.126768+00 |       2745 | i       | pgbench_accounts_pkey
 2017-11-07 11:42:21.406054+00 |         45 | r       | pgbench_tellers
 2017-11-07 11:42:21.645859+00 |         24 | r       | pgbench_branches
 2017-11-07 11:42:21.757086+00 |          2 | i       | pgbench_branches_pkey
 2017-11-07 11:42:21.757653+00 |          2 | i       | pgbench_tellers_pkey
(7 rows)

pgbench=>
pgbench=>       SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench->       relkind, c.relname
pgbench->       FROM pg_class c
pgbench->         JOIN pg_namespace n
pgbench->           ON n.oid = c.relnamespace
pgbench->         JOIN pg_user u
pgbench->           ON u.usesysid = c.relowner
pgbench->       WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench->       ORDER BY c.relpages DESC;
        clock_timestamp        | pg_prewarm | relkind |        relname
-------------------------------+------------+---------+-----------------------
 2017-11-07 11:42:33.914195+00 |      17903 | r       | pgbench_accounts
 2017-11-07 11:42:33.917725+00 |       6518 | r       | pgbench_history
 2017-11-07 11:42:33.918919+00 |       2745 | i       | pgbench_accounts_pkey
 2017-11-07 11:42:33.919412+00 |         45 | r       | pgbench_tellers
 2017-11-07 11:42:33.919427+00 |         24 | r       | pgbench_branches
 2017-11-07 11:42:33.919438+00 |          2 | i       | pgbench_branches_pkey
 2017-11-07 11:42:33.919443+00 |          2 | i       | pgbench_tellers_pkey
(7 rows)


14 Oct 2017

First alpha release of PsqlForks - Menon

Primer: PsqlForks aims to support all DB Engines that (even partially) speak Postgres (psqlforks = psql for Postgres forks).

Given that PsqlForks has been in development for a few weeks, it's time to stabilize a bit and towards that, we finally have Menon, PsqlForks first Alpha Release. Being an alpha, by definition it isn't ready for production, but it feels stable enough ... feel free to test it out!

Importantly, this fork is synced with postgres/master regularly, and should ideally sport all recent psql developments. Further, I am not a C expert and am just barely comprehending Postgres, so let me know of any 18-wheelers that I didn't see.

The release title - 'Menon', is a common sub-Caste in South-Indian state of Kerala. Selecting this nomenclature emanates from the idea of popularizing (heh!) common names and places from Kerala... and that it doesn't hurt to have an identifiable name (and while at it, add character) to a Release :) 

This release includes: 

  • Decent support for Redshift:
    • SQL tab completion for Redshift related variations
    • \d etc. now support Redshift specifics - ENCODINGs / SORTKEYs / DISTKEY / COMPRESSION etc.
    • Support Temporary Credentials using IAM Authentication (via AWS CLI)
    • View detailed progress here.
  • Basic support / Recognition semantics for:
    • CockroachDB - view progress here
    • PipelineDB
    • PgBouncer
    • RDS PostgreSQL
You could read more here:

For the interested:

13 Oct 2017

PsqlForks supports AWS IAM authentication for Redshift

With this commit, PsqlForks ( http://psqlforks.com ) can now fetch credentials from AWS IAM. Read more about Redshift's support for generating database credentials using IAM authentication feature, here.

Since the entire AWS CLI isn't baked into PsqlForks (yet!), you'd need a working copy of AWS CLI installed / working on the host (from where psql is called).

This took a while, since I missed the basic assumption that Redshift enforces SSL and psql doesn't attempt SSLMODE by default in the first try. The fact that CYGWIN wasn't super-smooth with AWS CLI in my test installation, didn't help either.

But as they say, all's well that ends well. There are few obvious additions that are possible (such as expiration validation / re-use unexpired credentials on re-connect etc.) but this should get merged in the forks mainline soon.

I guess it's time to begin thinking of releases, instead of making the mainline jittery with feature additions such as this one.

Yenjoy!


$ psql "sslmode=require host=redshift_cluster port=5439 dbname=redshift2" -U testing1
Password for user testing1:
psql: fe_sendauth: no password supplied

$ psql -I "sslmode=require host=redshift_cluster port=5439 dbname=redshift2" -U testing1

CLI: aws redshift get-cluster-credentials --auto-create --db-user testing1 --cluster-identifier redshift2 # Informational / testing output

psql (client-version:11devel, server-version:8.0.2, engine:redshift)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: on)
Type "help" for help.

redshift2=> select current_user;
 current_user
--------------
 testing1
(1 row)

redshift2=> \du
                     List of roles
 Role name |          Attributes           | Member of
-----------+-------------------------------+-----------
 redshift2 | Superuser, Create DB         +|
           | Password valid until infinity |
 testing1  |                               |

redshift2=> \q

$ ./psql --help | grep -i iam
  -I, --aws-iam-redshift   use temporary database credentials from AWS IAM Service

12 Oct 2017

PsqlForks now recognizes RDS PostgreSQL separately

With this commit, PsqlForks ( http://psqlforks.com ) now recognizes RDS PostgreSQL separately.

This isn't utilized much yet, but the infrastructure is going to be helpful in skipping / avoiding some commands that are defunct / not possible in the RDS PostgreSQL offering.