That Guy From Delhi: 2017

20 Nov 2017

Update: RDS Prewarm script updated to fetch FSM / VM chunks

(This post is in continuation to my previous post regarding Initializing RDS Postgres Instance)

This simple SQL "Initializes" the EBS volume linked to an RDS Instance, something which isn't possible to do without sending workload (and experience high Latency in the first run).

Key scenarios, where this is really helpful are:

Create a Read-Replica (or Hot Standby in Postgres terms)
Restore a new RDS Instance from a Snapshot

Update: The Script, now also does the following:

Now also fetches disk blocks related to FSM / VM of all tables
Now fetches all Indexes

Limitations that still exist:

~~TOAST tables are still directly inaccessible in RDS~~

~~Indexes for TOAST columns also fall under this category~~
~~Trying hard to see if this last hurdle can be worked around~~

~~Anyone with any ideas?!~~

Script needs to be run once per Database Owner

Not sure if there is any magic around this

Object ownership is a Postgres property

RDS Postgres does not give Superuser access

I'll try to ease this in the future

By creating a script to list the Users that this needs to run as
The other possibility is to use DBLink to run this for separate Users in a single run

I'll update here, in case I make any significant changes.

Sample Run

-[ RECORD 1 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:40:08.291891-05

table_size | 13 GB

freespace_map_size | 3240 kB

visibility_map_size | 408 kB

blocks_prefetched | 1639801

current_database | pgbench

schema_name | public

table_name | pgbench_accounts

-[ RECORD 2 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:43:37.703711-05

table_size | 2142 MB

freespace_map_size | 0 bytes

visibility_map_size | 0 bytes

blocks_prefetched | 274194

current_database | pgbench

schema_name | public

table_name | pgbench_accounts_pkey

-[ RECORD 3 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.899115-05

table_size | 440 kB

freespace_map_size | 24 kB

visibility_map_size | 8192 bytes

blocks_prefetched | 59

current_database | pgbench

schema_name | public

table_name | pgbench_tellers

-[ RECORD 4 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.901088-05

table_size | 240 kB

freespace_map_size | 0 bytes

visibility_map_size | 0 bytes

blocks_prefetched | 30

current_database | pgbench

schema_name | public

table_name | pgbench_tellers_pkey

-[ RECORD 5 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.905107-05

table_size | 40 kB

freespace_map_size | 0 bytes

visibility_map_size | 0 bytes

blocks_prefetched | 5

current_database | pgbench

schema_name | public

table_name | pgbench_branches_pkey

-[ RECORD 6 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.907089-05

table_size | 40 kB

freespace_map_size | 24 kB

visibility_map_size | 8192 bytes

blocks_prefetched | 9

current_database | pgbench

schema_name | public

table_name | pgbench_branches

-[ RECORD 7 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.907142-05

table_size | 0 bytes

freespace_map_size | 0 bytes

visibility_map_size | 0 bytes

blocks_prefetched | 0

current_database | pgbench

schema_name | public

table_name | pgbench_history

7 Nov 2017

Prewarming / Initializing an RDS Postgres instance (from S3)

UPDATE: Read this for recent updates. Now the SQL successfully fetches *all* disk blocks on most RDS PostgreSQL (read post for the rare exceptions).

As many of you know, that AWS RDS Postgres uses EBS which has an interesting feature called Lazy Loading that allows it to instantiate a disk (the size of which can be mostly anything from 10GB to 6TB) and it comes online within a matter of minutes. Although a fantastic feature, this however, can lead to unexpected outcomes when high-end production load is thrown at a newly launched RDS Postgres instance immediately after Restoring from a Snapshot.

One possible solution is to use the pg_prewarm Postgres Extension that is well supported in RDS Postgres, immediately after Restoring from a Snapshot, thereby reducing the side-effects of Lazy Loading.

Although pg_prewarm was originally meant for populating buffer-cache, this extension (in this specific use-case) is heaven-sent to initialize (fetch), (almost) the entire snapshot from S3 on to the RDS EBS volume in question. Therefore, even if you use pg_prewarm to run through all tables etc., thereby effectively evicting the recent run for the previous table from buffer-cache, it still does the job of initializing all disk-blocks with respect to the EBS volume.

I've just checked in the SQL to this repository that seems to do this magic pretty well. It also enlists why this would only take you ~70% of the way owing to restrictions / limitations (as per my current understanding).

In the Sample below, I restored a new RDS Postgres instance from a Snapshot and immediately thereafter ran this SQL on it.

Notice that the first table (pgbench_accounts) takes about 22 seconds to load the first time, and less than a second to load the second time.
Similarly the second table (pgbench_history) takes 15 seconds to load the first time and less than a second, the second time :) !

pgbench=> SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench-> relkind, c.relname
pgbench-> FROM pg_class c
pgbench-> JOIN pg_namespace n
pgbench-> ON n.oid = c.relnamespace
pgbench-> JOIN pg_user u
pgbench-> ON u.usesysid = c.relowner
pgbench-> WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench-> ORDER BY c.relpages DESC;
clock_timestamp | pg_prewarm | relkind | relname
-------------------------------+------------+---------+-----------------------
2017-11-07 11:41:44.341724+00 | 17903 | r | pgbench_accounts
2017-11-07 11:42:06.059177+00 | 6518 | r | pgbench_history
2017-11-07 11:42:17.126768+00 | 2745 | i | pgbench_accounts_pkey
2017-11-07 11:42:21.406054+00 | 45 | r | pgbench_tellers
2017-11-07 11:42:21.645859+00 | 24 | r | pgbench_branches
2017-11-07 11:42:21.757086+00 | 2 | i | pgbench_branches_pkey
2017-11-07 11:42:21.757653+00 | 2 | i | pgbench_tellers_pkey
(7 rows)

pgbench=>
pgbench=> SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench-> relkind, c.relname
pgbench-> FROM pg_class c
pgbench-> JOIN pg_namespace n
pgbench-> ON n.oid = c.relnamespace
pgbench-> JOIN pg_user u
pgbench-> ON u.usesysid = c.relowner
pgbench-> WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench-> ORDER BY c.relpages DESC;
clock_timestamp | pg_prewarm | relkind | relname
-------------------------------+------------+---------+-----------------------
2017-11-07 11:42:33.914195+00 | 17903 | r | pgbench_accounts
2017-11-07 11:42:33.917725+00 | 6518 | r | pgbench_history
2017-11-07 11:42:33.918919+00 | 2745 | i | pgbench_accounts_pkey
2017-11-07 11:42:33.919412+00 | 45 | r | pgbench_tellers
2017-11-07 11:42:33.919427+00 | 24 | r | pgbench_branches
2017-11-07 11:42:33.919438+00 | 2 | i | pgbench_branches_pkey
2017-11-07 11:42:33.919443+00 | 2 | i | pgbench_tellers_pkey
(7 rows)

14 Oct 2017

First alpha release of PsqlForks - Menon

Primer: PsqlForks aims to support all DB Engines that (even partially) speak Postgres (psqlforks = psql for Postgres forks).

Given that PsqlForks has been in development for a few weeks, it's time to stabilize a bit and towards that, we finally have Menon, PsqlForks first Alpha Release. Being an alpha, by definition it isn't ready for production, but it feels stable enough ... feel free to test it out!

Importantly, this fork is synced with postgres/master regularly, and should ideally sport all recent psql developments. Further, I am not a C expert and am just barely comprehending Postgres, so let me know of any 18-wheelers that I didn't see.

The release title - 'Menon', is a common sub-Caste in South-Indian state of Kerala. Selecting this nomenclature emanates from the idea of popularizing (heh!) common names and places from Kerala... and that it doesn't hurt to have an identifiable name (and while at it, add character) to a Release :)

This release includes:

Decent support for Redshift:

SQL tab completion for Redshift related variations
\d etc. now support Redshift specifics - ENCODINGs / SORTKEYs / DISTKEY / COMPRESSION etc.
Support Temporary Credentials using IAM Authentication (via AWS CLI)
View detailed progress here.

Basic support / Recognition semantics for:

CockroachDB - view progress here
PipelineDB
PgBouncer
RDS PostgreSQL

You could read more here:

For the interested:

Upcoming Milestones: https://github.com/robins/postgres/milestones
Existing open issues: https://github.com/robins/postgres/issues
Feedback / Bugs? Post them here

13 Oct 2017

PsqlForks supports AWS IAM authentication for Redshift

With this commit, PsqlForks ( http://psqlforks.com ) can now fetch credentials from AWS IAM. Read more about Redshift's support for generating database credentials using IAM authentication feature, here.

Since the entire AWS CLI isn't baked into PsqlForks (yet!), you'd need a working copy of AWS CLI installed / working on the host (from where psql is called).

This took a while, since I missed the basic assumption that Redshift enforces SSL and psql doesn't attempt SSLMODE by default in the first try. The fact that CYGWIN wasn't super-smooth with AWS CLI in my test installation, didn't help either.

But as they say, all's well that ends well. There are few obvious additions that are possible (such as expiration validation / re-use unexpired credentials on re-connect etc.) but this should get merged in the forks mainline soon.

I guess it's time to begin thinking of releases, instead of making the mainline jittery with feature additions such as this one.

Yenjoy!

$ psql "sslmode=require host=redshift_cluster port=5439 dbname=redshift2" -U testing1
Password for user testing1:
psql: fe_sendauth: no password supplied

$ psql -I "sslmode=require host=redshift_cluster port=5439 dbname=redshift2" -U testing1

CLI: aws redshift get-cluster-credentials --auto-create --db-user testing1 --cluster-identifier redshift2 # Informational / testing output

psql (client-version:11devel, server-version:8.0.2, engine:redshift)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: on)
Type "help" for help.

redshift2=> select current_user;
current_user
--------------
testing1
(1 row)

redshift2=> \du
List of roles
Role name | Attributes | Member of
-----------+-------------------------------+-----------
redshift2 | Superuser, Create DB +|
| Password valid until infinity |
testing1 | |

redshift2=> \q

$ ./psql --help | grep -i iam
-I, --aws-iam-redshift use temporary database credentials from AWS IAM Service

12 Oct 2017

PsqlForks now recognizes RDS PostgreSQL separately

With this commit, PsqlForks ( http://psqlforks.com ) now recognizes RDS PostgreSQL separately.

This isn't utilized much yet, but the infrastructure is going to be helpful in skipping / avoiding some commands that are defunct / not possible in the RDS PostgreSQL offering.

29 Sept 2017

PsqlForks now recognizes PgBouncer

With this commit, PsqlForks knows when it's talking to PgBouncer (and not Postgres).

Down the line, this should pave way for PsqlForks to more cleanly convey why (most of) the given psql shortcut(s) don't work (and what else does).

As always, the psql/README always has the most updated status of any engine support.

$ psql -h localhost -E -p6543 -U postgres pgbouncer
psql (client-version:11devel, server-version:1.7.1/bouncer, engine:pgbouncer)
Type "help" for help.

pgbouncer=# show version;
NOTICE: pgbouncer version 1.7.1
SHOW
pgbouncer=#

25 Sept 2017

PsqlForks now supports CockroachDB

PsqlForks now supports CockroachDB as much as is currently possible. You can check it's current SQL status here.

$ /opt/postgres/master/bin/psql -h localhost -E -p 26257 -U root
psql (client-version:11devel, server-version:9.5.0, engine:cockroachdb)
Type "help" for help.

root=> select version();
version()
--------------------------------------------------------------------------
CockroachDB CCL v1.0.6 (linux amd64, built 2017/09/14 15:15:48, go1.8.3)
(1 row)
bank=> \l

List of databases

--------------------+-------+-------------------+------------+------------+-------------------

(5 rows)

bank=> \dv

List of relations

Schema | Name | Type | Owner

--------+------+------+-------

bank | a | view |

(1 row)

bank=> \di

List of relations

Schema | Name | Type | Owner | Table

--------+-------------------------+-------+-------+------------

bank | primary | index | | accounts

system | jobs_status_created_idx | index | | jobs

system | primary | index | | descriptor

system | primary | index | | eventlog

system | primary | index | | jobs

system | primary | index | | lease

system | primary | index | | namespace

system | primary | index | | rangelog

system | primary | index | | settings

system | primary | index | | ui

system | primary | index | | users

system | primary | index | | zones

(12 rows)

15 Sept 2017

PsqlForks now supports PipelineDB

After working on this PSQL variant that intends to support all Postgres forks, I finally narrowed down to naming it.

Since this was essentially Psql (for) Forks, quite intuitively, I chose to name it PsqlForks.

Considering that until recently this fork just supported Amazon Redshift, this naming didn't make much sense if it wasn't supporting at least 2 forks :) !

Thus, PsqlForks now supports PipelineDB!

$ /opt/postgres/master/bin/psql -U pipeline -p 5434 -h localhost pipeline
psql (client-version:11devel, server-version:9.5.3, engine:pipelinedb)
Type "help" for help.

pipeline=# \q

2 Sept 2017

psql \d now supports Interleaved / Compound SORTKEYs (in Redshift)

As a side-note, there is a consideration as to whether this should be on a separate section of its own (and not under Indexes, which it clearly isn't). May be another day. Happy Redshifting :) !

Update (15th Sep 2017):
This project has now been named PsqlForks!

31 Aug 2017

psql \d now supports DISTKEY / SORTKEY / ENCODING (in Redshift)

This is in continuation of my work for (my forked version of) psql to better support Redshift (read more here).

Now \d table provides some additional Redshift specific table properties such as:

DISTKEY
SORTKEY
COMPRESSION (ENCODING)
ENCRYPTION

Sample:

t3=# CREATE TABLE customer(

custkey SMALLINT ENCODE delta NOT NULL,

custname INTEGER DEFAULT 10 ENCODE raw NULL,

gender BOOLEAN ENCODE RAW,

address CHAR(5) ENCODE LZO,

city BIGINT identity(0, 1) ENCODE DELTA,

state DOUBLE PRECISION ENCODE Runlength,

zipcode REAL,

tempdel1 DECIMAL ENCODE Mostly16,

tempdel2 BIGINT ENCODE Mostly32,

tempdel3 DATE ENCODE DELTA32k,

tempdel4 TIMESTAMP ENCODE Runlength,

tempdel5 TIMESTAMPTZ ENCODE DELTA,

tempdel6 VARCHAR(MAX) ENCODE text32k,

start_date VARCHAR(10) ENCODE TEXT255

)

DISTSTYLE KEY

DISTKEY (custname)

INTERLEAVED SORTKEY (custkey, custname);

CREATE TABLE

t3=# \d customer

TABLE "public.customer"

------------+-----------------------------+-----------+---------+---------+---------+------------+-----------+----------+------------------------------------

custname | integer | none | t | 2 | f | none | | | 10

gender | boolean | none | f | 0 | f | none | | |

address | character(5) | lzo | f | 0 | f | none | | |

state | double precision | runlength | f | 0 | f | none | | |

zipcode | real | none | f | 0 | f | none | | |

tempdel1 | numeric(18,0) | mostly16 | f | 0 | f | none | | |

tempdel2 | bigint | mostly32 | f | 0 | f | none | | |

tempdel3 | date | delta32k | f | 0 | f | none | | |

tempdel4 | timestamp without time zone | runlength | f | 0 | f | none | | |

tempdel5 | timestamp with time zone | delta | f | 0 | f | none | | |

tempdel6 | character varying(65535) | text32k | f | 0 | f | none | | |

start_date | character varying(10) | text255 | f | 0 | f | none | | |

Now that a few 'ToDos' are listed on Github Issues, next would probably involve working on this ticket, which aims at elaborate SORTKEY details (such as INTERLEAVED / COMPOUND) etc. when using Describe Table.

Update (15th Sep 2017):
This project has now been named PsqlForks!

12 Aug 2017

Redshift support for psql

Am sure you know that psql doesn't go out of it's way to support Postgres' forks natively. I obviously understand the reasoning, which allowed me to find a gap that I could fill here.

The existing features (in psql) that work with any Postgres fork (like Redshift) are entirely because it is a fork of Postgres. Since I use psql heavily at work, last week I decided to begin maintaining a Postgres fork that better supports (Postgres forks, but initially) Redshift. As always, unless explicitly mentioned, this is entirely an unofficial effort.

The 'redshift' branch of this Postgres code-base, is aimed at supporting Redshift in many ways:

Support Redshift related artifacts

Redshift specific SQL Commands / variations
Redshift Libraries

Support AWS specific artifacts

For e.g. AWS Regions

Support Redshift specific changes

For e.g. "/d table" etc.

The idea is:

Maintain this branch for the long-term

At least as long as I have an accessible Redshift cluster

Down the line look at whether other Postgres forks (for e.g. RDS Postgres) need such special attention

Although nothing much stands out yet

Except some rare exceptions like this or this, which do need to go through an arduous long wait / process of refinement.

Change the default port to 5439 (or whatever the flavour supports)

...with an evil grin ;)

Additionally, as far as possible:

Keep submitting Postgres related patches back to Postgres master
Keep this branch up to date with Postgres master

Update (31st August 2017)

Currently this branch supports most Redshift specific SQL commands such as

CREATE LIBRARY
CREATE TABLE (DISTKEY / DISTSTYLE / ...)
Returns non-SQL items like

ENCODINGs (a.k.a. Compressions like ZSTD / LZO etc )
REGIONs (for e.g. US-EAST-1 etc.)

Of course some complex variants (for e.g. GRANT SELECT, UPDATE ON ALL TABLES IN SCHEMA TO GROUP xxx ) don't automatically come up with tab-complete feature. This is primarily because psql's tab-complete feature isn't very powerful to cater to all such scenarios which in turn is because psql's auto-complete isn't a full-fledged parser to begin with.
In a nutshell, this branch is now in a pretty good shape to auto-complete the most common Redshift specific SQL Syntax.
The best part is that this still merges perfectly with Postgres mainline!

Let me know if you find anything that needs inclusion, or if I missed something.

====================================

$ psql -U redshift_user -h localhost -E -p 5439 db
psql (client-version:11devel, server-version:8.0.2, engine:redshift)
Type "help" for help.

db=#

3 Aug 2017

Reducing Wires

Recently got an additional monitor for my workstation@home and found that the following wires were indispensable:

USB Mouse
Monitor VGA / HDMI / DVI cable
USB Hub cable (Pen Drive etc.)

I was lucky that this ($20 + used) Dell monitor was an awesome buy since it came with a Monitor USB Hub (besides other goodies such as vertical rotate etc).

After a bit of rejigging, this is how things finally panned-out:

1 USB Wire (from the laptop) for the MUH (Monitor USB Hub)

This is usually something like this.

Use a USB->DVI converter and use that to connect MUH -> Monitor DVI port

This is usually something like this.

Plug USB Mouse to MUH
With things working so well, I also plugged a Wireless Touchpad dongle to the MUH

So now when I need to do some office work, connecting 1 USB wire gets me up and running!

#LoveOneWires :)

Now only if I could find a stable / foolproof Wireless solution here ;)

29 Jul 2017

Symbols in Redshift User Passwords work just fine

Recently read a few posts / discussions about people doubting Redshift not accepting (working-well with) ASCII symbols in User-Passwords.

It felt like a good time to write this short post showing that Redshift (Engine) seems to work fine with (non-alphanumeric) (printable) ASCII symbols.

You can see a few things (in the sample output given below):

All non-alphanumeric printable ASCII characters worked fine (at least all that my US-International / QWERTY keyboard could throw at it)
For those who also need ' (single-quote) and " (double-quote) you could always use $$ as quote-delimiters
You still need at least One of each of the following:

Upper-Case English-Letter
Lower-Case English-Letter
One Digit / Numeral

------------------------------------------------------------
# psql -U adminuser -h rs_cluster -p 5439 db

psql (9.6.3, server 8.0.2)
Type "help" for help.

rs_cluster adminuser@db-# alter user userb with password 'Aa1~!@#$%^&*()_+-`{}[]|";:,<.>/?';
ALTER USER
Time: 237.012 ms
rs_cluster adminuser@db-# \q

# psql -U userb -h rs_cluster -p 5439 db
Password for user userb:

psql (9.6.3, server 8.0.2)
Type "help" for help.

rs_cluster userb@db-# alter user userb with password $$Aa1~!@#$%^&*()_+-`{}[]|";:,<.>/?'"$$;
ALTER USER
Time: 191.505 ms
rs_cluster adminuser@db-# \q

# psql -U userb -h rs_cluster -p 5439 db
Password for user userb:

psql (9.6.3, server 8.0.2)
Type "help" for help.

rs_cluster userb@db-#
------------------------------------------------------------

21 Jul 2017

Using generate_series() in Redshift

Considering that Redshift clearly states that it doesn't support (the commonly used postgres function) generate_series(), it gets very frustrating if you just want to fill a table with a lot of rows and can't without a valid data-source.

Solution (Generates a billion integers on my test-cluster):

--INSERT INTO tbl
WITH x AS (
SELECT 1
FROM stl_connection_log a, stl_connection_log b, stl_connection_log c
-- LIMIT 100
)
SELECT row_number() over (order by 1) FROM x;

For a Redshift server with even a basic level of login activity, this should generate enough rows. For e.g. On my test cluster, where I am the only user, this currently generates 4034866688 (4 billion) rows :) !

Interestingly, irrespective of the document, generate_series() actually does work on Redshift:

# select b from generate_series(1,3) as a(b);
┌───┐
│ b │
├───┤
│ 1 │
│ 2 │
│ 3 │
└───┘
(3 rows)

The reason why this wouldn't let you insert any rows to your table though, is that this is a Leader-Node-Only function, whereas INSERTs (on any non-single Redshift Cluster) are run on the Compute Nodes (which don't know about this function).

The reason why the above works, is ROW_NUMBER() and CROSS JOIN allow us to generate a large number of rows, but for that, the initial data-set (here the STL_CONNECTION_LOG System Table) should have at least some rows to multiply on! You could use any other system table (that is available on Compute Nodes) if required, for some other purpose.

Play On!

31 May 2017

Patch: Using --no-comments with pg_dump

Recently I submitted a patch for review that allows a non-superuser to practically use a backup taken from pg_dump.

Currently it is a kludge (and well known at that - Ref 1 / Ref 2 / Ref 3 / Ref 4) but since it's (the ideal solution) too big a byte to chew and not in high-demand, it has seen little to no traction in the past decade.

This patch should allow the above. But more importantly, it should also allow regular users of AWS RDS Postgres as well as Google Cloud Postgres databases (who do not get SuperUser access by-design) to reliably use the backups, instead of tinkering with the backup SQLs and remove things like COMMENT ON EXTENSION for it to even run during restoration.

The bad news is that since Postgres 10 has already branched off, I doubt this would see the light of the day (if at all) any time before Postgres 11 (unless there is consensus that it's helpful enough and gets 'back-patched' to Postgres 10 to be released around September 2017).

Update (3rd Oct 2017):
This is now a part of my PsqlForks branch. You can check the related commit here.

Update (26th Jan 2018):
This is now part of the official Postgres v11 branch. You can check the related commit here.

horntail it is :)

I had been planning to contribute to Postgres Buildfarm by running it on one of my VMs and finally got that done... and horntail it is.

A few things to note, before it fades from (my) memory:

It (obviously) requires quite a bunch of libraries, especially if you're using the defaults (for e.g. with ssl / krb5 etc..)

You may either disable some of those exotic options (not ideally recommended)
Or, you may use a silo'ed VM just for this purpose so that it doesn't slow your regular work
I did neither (of the above) but didn't care about performance of this VM

Probably would enable more Options / Branches down the line and re-submit

My distro was Ubuntu and so YMMV, but the packages weren't very obvious. If you see something fail and you check logs, you'd see that 'abc' was missing, but wouldn't have the faintest idea which package satisfied that

Some hiccup missing libraries were (These may be obvious for most, but were new to me ):

gss_init_sec_context
msgfmt
tclConfig

It'd be good if someone could catalog the packages required

For all Unix variants - obviously easier said that done, but

It's possible that there may be more people willing (like me) but unable to scale all the walls

At least the super-set of what's required when all flags are enabled, from which people could try what's needed for his / her set of enabled flags

As it always is, the documentation is relatively clear, just that persistence (especially for exotic Linux variants / issues) is needed in large quantities.

All the best :)

14 Mar 2017

Using pg_dumpall with Google Cloud Postgres

Luckily the recent update to pg_dumpall works (without any extra effort) to dump Globals off the recently launched Google Cloud PostgreSQL service well.

This is because this Postgres service seems to be using the same limitation (as AWS RDS Postgres does) in that none of the user accounts are SuperUser in the Postgres sense. Therefore, this platform too disallows accessing pg_authid.

The fallback here too, is to use pg_roles to read (as much as is possible) to dump Role related Globals. Read here for more info on the recent changes to pg_dumpall that makes this work.

12 Mar 2017

Redshift Limitations

Recently have been toying around with Redshift (a popular Data-Warehouse tool by AWS) a little bit and see some minor but obvious limitations in SQL limitations:

FULL OUTER JOIN + ON Func(x) = Func(y) : Not Supported

Although Redshift supports most variants of FULL OUTER JOIN, however, when the JOIN condition contains a function on both sides, it gives up complaining "ERROR: XX000: could not devise a query plan for the given query".

CREATE TABLE x (a integer);
INSERT INTO x VALUES (1);

SELECT x1.a, x2.a
FROM x x1
FULL OUTER JOIN x x2
ON LOWER(x1.a) = LOWER(x2.a); -- Fails
-- ON LOWER(x1.a) = x2.a; -- Works

-- ON x1.a = LOWER(x2.a); -- Works

-- ON x1.a = x2.a; -- Obviously Works

Workaround: The obvious way around it is to move one of the functions to a separate SQL (for e.g create a Temp Table and use that in the final computation), something like this:

CREATE TEMP TABLE x1(a) AS SELECT LOWER(a) FROM x;

SELECT x1.a, x2.a

FROM x1

FULL OUTER JOIN x x2

ON x1.a = LOWER(x2.a); -- Works