14 Mar 2017

Using pg_dumpall with Google Cloud Postgres

Luckily the recent update to pg_dumpall works (without any extra effort) to dump Globals off the recently launched Google Cloud PostgreSQL service well.

This is because this Postgres service seems to be using the same limitation (as AWS RDS Postgres does) in that none of the user accounts are SuperUser in the Postgres sense. Therefore, this platform too disallows accessing pg_authid.

The fallback here too, is to use pg_roles to read (as much as is possible) to dump Role related Globals. Read here for more info on the recent changes to pg_dumpall that makes this work.

12 Mar 2017

Redshift Limitations

Recently have been toying around with Redshift (a popular Data-Warehouse tool by AWS) a little bit and see some minor but obvious limitations in SQL limitations:

FULL OUTER JOIN + ON Func(x) = Func(y) : Not Supported

Although Redshift supports most variants of FULL OUTER JOIN, however, when the JOIN condition contains a function on both sides, it gives up complaining "ERROR:  XX000: could not devise a query plan for the given query".

CREATE TABLE x (a integer);
INSERT INTO x VALUES (1);  

SELECT x1.a, x2.a
FROM x x1
FULL OUTER JOIN x x2
     ON LOWER(x1.a) = LOWER(x2.a); -- Fails
  -- ON LOWER(x1.a) = x2.a; -- Works
  -- ON x1.a = LOWER(x2.a); -- Works
  -- ON x1.a = x2.a; -- Obviously Works

Workaround: The obvious way around it is to move one of the functions to a separate SQL (for e.g create a Temp Table and use that in the final computation), something like this:

CREATE TEMP TABLE x1(a) AS SELECT LOWER(a) FROM x;

SELECT x1.a, x2.a
FROM x1
FULL OUTER JOIN x x2
ON x1.a = LOWER(x2.a); -- Works

7 Mar 2017

Using pg_dumpall with AWS RDS Postgres

My patch (allowing a non-super-user to dump Globals) just got committed to Postgres 10.0.

Besides the use mentioned above, this patch allows one to dump Globals from their AWS RDS Postgres Database and have a complete backup as a part of their Disaster Recovery (DR) solution. This earlier required an inelegant hack (detailed here in another post of mine) that wasn't very convenient for a regular user.

For those interested, RDS Postgres (by design) doesn't allow you to read pg_authid, which was earlier necessary for pg_dumpall to work. With this patch checked-in, pg_dumpall now uses a workaround (pg_roles) to read the same data (except passwords) and generate the same SQL Script.

With that mentioned, let's get our hands dirty and see a working solution:

# Take RDS Postgres Dump
# We store the pgdb database in pgdb.sql and globals in pgdb_globals.sql
pg_dumpall --globals-only --no-role-password \
  -U rdssuperuser -h rdsinstance \
  2>stderr.txt > pgdb_globals.sql
pg_dump -U rdssuperuser -h rdsinstance pgdb 2>stderr.txt > pgdb.sql

# Restore on Local Machine
psql -U postgres -c "CREATE DATABASE pgdb;"

# rdsadmin database (even if empty) is required for the next script
psql -U postgres -c "CREATE DATABASE rdsadmin;" 

psql -U postgres pgdb < pgdb_globals.sql
psql -U postgres pgdb < pgdb.sql

Note: The --no-role-password flag is still necessary since AWS RDS doesn't allow you to read Passwords. Therefore, the above generates a script that resets all users to be password-less, but that's still better than having to restore Globals (like 'CREATE USER ... ') manually by hand!

Big thanks to Simon for ironing out a few issues, Stephen, David & Robert for their (critical) reviews.