14 Mar 2017

Using pg_dumpall with Google Cloud Postgres

Luckily the recent update to pg_dumpall works (without any extra effort) to dump Globals off the recently launched Google Cloud PostgreSQL service well.

This is because this Postgres service seems to be using the same limitation (as AWS RDS Postgres does) in that none of the user accounts are SuperUser in the Postgres sense. Therefore, this platform too disallows accessing pg_authid.

The fallback here too, is to use pg_roles to read (as much as is possible) to dump Role related Globals. Read here for more info on the recent changes to pg_dumpall that makes this work.

12 Mar 2017

Redshift Limitations

Recently have been toying around with Redshift (a popular Data-Warehouse tool by AWS) a little bit and see some minor but obvious limitations in SQL limitations:

FULL OUTER JOIN + ON Func(x) = Func(y) : Not Supported

Although Redshift supports most variants of FULL OUTER JOIN, however, when the JOIN condition contains a function on both sides, it gives up complaining "ERROR:  XX000: could not devise a query plan for the given query".

CREATE TABLE x (a integer);
INSERT INTO x VALUES (1);  

SELECT x1.a, x2.a
FROM x x1
FULL OUTER JOIN x x2
     ON LOWER(x1.a) = LOWER(x2.a); -- Fails
  -- ON LOWER(x1.a) = x2.a; -- Works
  -- ON x1.a = LOWER(x2.a); -- Works
  -- ON x1.a = x2.a; -- Obviously Works

Workaround: The obvious way around it is to move one of the functions to a separate SQL (for e.g create a Temp Table and use that in the final computation), something like this:

CREATE TEMP TABLE x1(a) AS SELECT LOWER(a) FROM x;

SELECT x1.a, x2.a
FROM x1
FULL OUTER JOIN x x2
ON x1.a = LOWER(x2.a); -- Works

8 Mar 2017

Using pg_dumpall with AWS RDS Postgres

My patch (allowing a non-super-user to dump Globals) just got committed to Postgres 10.0.

Besides the use mentioned above, this patch allows one to dump Globals from their AWS RDS Postgres Database and have a complete backup as a part of their Disaster Recovery (DR) solution. This earlier required an inelegant hack (detailed here in another post of mine) that wasn't very convenient for a regular user.

For those interested, RDS Postgres (by design) doesn't allow you to read pg_authid, which was earlier necessary for pg_dumpall to work. With this patch checked-in, pg_dumpall now uses a workaround (pg_roles) to read the same data (except passwords) and generate the same SQL Script.

With that mentioned, let's get our hands dirty and see a working solution:

# Take RDS Postgres Dump
# We store the pgdb database in pgdb.sql and globals in pgdb_globals.sql
pg_dumpall --globals-only --no-role-passwords \
  -U rdssuperuser -h rdsinstance \
  2>stderr.txt > pgdb_globals.sql
pg_dump -U rdssuperuser -h rdsinstance pgdb 2>stderr.txt > pgdb.sql

# Restore on Local Machine
psql -U postgres -c "CREATE DATABASE pgdb;"

# rdsadmin database (even if empty) is required for the next script
psql -U postgres -c "CREATE DATABASE rdsadmin;" 

psql -U postgres pgdb < pgdb_globals.sql
psql -U postgres pgdb < pgdb.sql

Note: The --no-role-passwords flag is still necessary since AWS RDS doesn't allow you to read Passwords. Therefore, the above generates a script that resets all users to be password-less, but that's still better than having to restore Globals (like 'CREATE USER ... ') manually by hand!

Big thanks to Simon for ironing out a few issues, Stephen, David & Robert for their (critical) reviews.

24 Dec 2016

Watch an Online Movie: (Wget -c || Deluge) && Chrome > Chromecast

For all those who are in an odd situation where:


  • They have a paid account to a Movie site (like Netflix etc.)
  • Are unable to watch movies online, just because the video-streaming is just too slow
    • Either because your convenient times are 'peak' times for the server
    • Or, you are behind a painfully bad ISP
  • And are able to download the movie, as an option.

To such customers, downloading the movie overnight (using for e.g. wget) would be a big help!

I regularly use this, to download the movie, and watch with non-tech people (my kids) who can't be explained why the movie keeps 'Buffering'!

c:\bin\wget \
  -O KD1242.mp4 \ # Output filename
  -t 0 \ # Retrying indefinitely
  -c -T 10 \ # Reconnect + Timeouts are 10 seconds
  -w 10 \ # wait 10 seconds before retrying a disconnection
"https://www.yourfavouritechannel.com/abcd/KD1242.mp4?g=1f3410635&sha1=JGgJm02BOvqgsdvC32BcUg"

Windows binaries for GPL'ed GNU software (such as Wget) are heaven sent here:



And if you need a Big-TV experience (for e.g. if you have a Chromecast), you could stitch things together by using the Google Cast extension for your Chrome Browser, and open up "c:\" on the browser to play the movie directly in the browser (since VLC stream is still in Beta):


If by chance you're downloading videos via torrents (For e.g. NASA videos), here are my GPL recommendations for Windows:

  • Deluge: If you haven't seen this, you should really replace your uTorrent etc. clients with this one
  • Use the Streaming Extension (Github Link)
  • Copy the URL that the Extension provides + Paste to Chrome
  • Cast your tab to your Chromecast + Enjoy!


Have Fun!

9 Dec 2016

Custom pg_dumpall now works with AWS Redshift

While trying to work with AWS Redshift, it was interesting to see pg_dumpall failing to dump databases in my cluster! 

Delving further, for obvious reasons a managed service like this hides some global-information, which pg_dump(all) needs ... and no, this post is NOT about circumventing that.

This branch, gives a 'near' workaround for those who are okay with being able to extract all databases + (almost*) all Global information (Users etc.) in a single command. 


There are caveats though:

  • Redshift doesn't support COPY TO, so the best workaround is using INSERT. Painful, but works.
  • Barring passwords, all Globals can be dumped. The script just resets all users to be password-less, but that's better than having to do 'CREATE USER ... ' commands for your users by hand!

For some people, these caveats are going to be okay, considering that they get a scriptable way of taking a dump of all databases at one go, along with User information. 

If you're interested in a similar hack for AWS Postgres, you're in luck!

Down the line, I'll try to push this to the core, but for now, this works!

Go Play :) !!

Patched pg_dumpall works with AWS RDS Postgres

UPDATE: This patch is now a part of Native Postgres!! Read more here.

While trying to work with AWS RDS Postgres, it was interesting to see pg_dumpall failing to dump databases at all!

Delving further, for obvious reasons a managed service like this hides some global-information, which pg_dump(all) needs ... and no, this post is NOT about circumventing that.

This branch, gives a 'near' workaround for those who are okay with being able to extract all databases + (almost*) all Global information (Users etc.) in a single command. The *only* Global that can't be dumped are the User-Passwords, which for some people is okay, considering that they still get a scriptable way of taking a dump of all databases at one go, along with all User information. 

Note: For those keen to know, the script just resets all users to be password-less, but that's better than having to do 'CREATE USER ... ' commands for your users by hand!

Down the line, I'll try to push this to the core, but for now, this works!

UPDATE: This patch is now a part of Native Postgres!! Read more here.

Go Play :) !!

14 Apr 2016

ALTER TABLE Gotcha related to USING

This is an old gotcha, that has been documented (although probably not much) and is something that we (in my company) almost got caught with recently.

This post, is about documenting the issue that looks like something that isn't going away anytime soon.

Consider this:

CREATE TABLE k (id INTEGER, b text);
CREATE TABLE l (id INTEGER);

CREATE VIEW vw_k AS SELECT k.b FROM k JOIN l USING (id);
TABLE vw_k;
ALTER TABLE l RENAME COLUMN id TO id2;
INSERT INTO k(id, b) VALUES (1, 'abc');
INSERT INTO l(id2)   VALUES (1);
TABLE vw_k;
DROP VIEW vw_k;
CREATE VIEW vw_k AS SELECT k.b FROM k JOIN l USING (id);


This is the output:

rt_testing=# CREATE TABLE k (id INTEGER, b text);
CREATE TABLE
rt_testing=# CREATE TABLE l (id INTEGER);
CREATE TABLE
rt_testing=# CREATE VIEW vw_k AS SELECT k.b FROM k JOIN l USING (id);
CREATE VIEW
rt_testing=# TABLE vw_k;
 b
---
(0 rows)

rt_testing=# ALTER TABLE l RENAME COLUMN id TO id2;
ALTER TABLE
rt_testing=# INSERT INTO k(id, b) VALUES (1, 'abc');
INSERT 0 1
rt_testing=# INSERT INTO l(id2)    VALUES (1);
INSERT 0 1
rt_testing=# TABLE vw_k;
  b
-----
 abc
(1 row)

rt_testing=# DROP VIEW vw_k;
DROP VIEW
rt_testing=# CREATE VIEW vw_k AS SELECT k.b FROM k JOIN l USING (id);
ERROR:  column "id" specified in USING clause does not exist in right table

If you noticed, despite changing the base table's relevant column (i.e. l.id => l.id2), the VIEW kept working without fail, until you actually try to DROP / CREATE the VIEW sometime down the line.

In fact, this gets difficult to trace because the VIEW continues to work normally even after the column name change. For e.g. New rows inserted to the updated table, continue to show up in the VIEW as if its definition was up to date!

To summarize, whenever changing a column name (used in a USING clause on a dependent VIEW), be careful that PostgreSQL may not necessarily complain about the dependency. You'd need to be careful to make that check yourself. Or else its possible that sometime down the line, when you're trying to restore a production backup, you'd start seeing errors related to changes you made a year back!

What's in an empty table?

How much storage does an empty table in Postgres take? This is a post about Postgres tables that store ... well basically ...  Nothing . The...