That Guy From Delhi

29 Feb 2020

Optimizations in GROUP BY vs SELECT DISTINCT

(This came out of something I was trying out + discussing with Postgres enthusiasts - thanks to all for clarifying doubts)

This article aims at highlighting one aspect of how the query planner implementation of SELECT * GROUP BY differs from SELECT DISTINCT.

For example:

SELECT b,c,d FROM a GROUP BY b,c,d;
vs
SELECT DISTINCT b,c,d FROM a;

We see a few scenarios where Postgres optimizes by removing unnecessary columns from the GROUP BY list (if a subset is already known to be Unique) and where Postgres could do even better. To highlight this difference, here I have an empty table with 3 columns:

postgres=# create table a (b integer, c text, d bigint);
CREATE TABLE

postgres=# \d a
                 Table "public.a"
Column | Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
b      | integer |           |          |
c      | text    |           |          |
d      | bigint |           |          |

On this table, we can see that SELECT * GROUP BY generates the exact same plan as SELECT DISTINCT. In particular, we're interested in the "Group Key" which is the same for both SQLs:

postgres=# explain select distinct b,c,d from a;
                         QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=29.78..31.78 rows=200 width=44)
   Group Key: b, c, d
   -> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)

postgres=# explain select b,c,d from a group by b,c,d;
                         QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=29.78..31.78 rows=200 width=44)
   Group Key: b, c, d
   -> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)

Having said that, if the same table is created with a PRIMARY KEY, we see that GROUP BY becomes smarter, in that we can see that the "Group Key" uses the Primary Key (here it is 'b') and correcty discards columns 'c' and 'd'. Nice 😄!

postgres=# create table a (b integer PRIMARY KEY, c text, d bigint);
CREATE TABLE
postgres=# explain select distinct b,c,d from a;
                         QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=29.78..41.08 rows=1130 width=44)
   Group Key: b, c, d
   -> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)

postgres=# explain select b,c,d from a group by b,c,d;
                         QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=24.12..35.42 rows=1130 width=44)
   Group Key: b
   -> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)

Let's check if we get the same optimization if we create a UNIQUE index on the column. The answer? Sadly No! Furthermore, I went ahead and created a NOT NULL constraint, but that didn't change anything either. (Do note that UNIQUE columns can have multiple rows with NULLs).

postgres=# create table a (b integer unique not null, c text, d bigint);
CREATE TABLE

postgres=# explain select b,c,d from a group by b,c,d;
                         QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=29.78..41.08 rows=1130 width=44)
   Group Key: b, c, d
   -> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)

Regarding the above, IIUC this is an obvious performance optimization that Postgres is still leaving on the table (as of v13+):

postgres=# select version();
                                                     version
------------------------------------------------------------------------------------------------------------------
PostgreSQL 13devel on i686-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, 32-bit
(1 row)

Next, does it still optimize this, if the PRIMARY KEY is not the first column in the GROUP BY? Answer? Yes! (The engine can optimize if any of the GROUPed BY column is a Primary Key! Noice !

postgres=# create table a (b integer, c text primary key, d bigint);
CREATE TABLE

postgres=# explain select b,c,d from a group by b,c,d;
                         QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=24.12..35.42 rows=1130 width=44)
   Group Key: c
   -> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)

... and what if the PRIMARY KEY is a composite key of any of the columns in the GROUP BY column list? YES again 😄 !

postgres=# create table a (b int, c text, d bigint, primary key (c,d)) ;
CREATE TABLE
postgres=# explain select b,c,d from a group by b,c,d;
                         QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=26.95..28.95 rows=200 width=44)
   Group Key: c, d
   -> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)

Lastly, although some of these "optimizations" are things-to-avoid when writing good SQL, the reality is that ORM generated SQLs aren't that smart yet and then it's great that Postgres already implements these obvious optimizations.

8 Nov 2019

Compiling any C source on WSL (Linux on Windows)

This is a short post, in hopes that someone doesn't spend hours trying to wonder why a fresh Postgres Source Clone (or any C code for that matter) complains on the first (1st) non comment line about something very trivial (see sample below) as soon as you trigger a ./configure

$ ./configure
: not found: 18: ./configure:
./configure: 34: ./configure: Syntax error: newline unexpected (expecting ")")

This same Postgres code compiles beautifully in Ubuntu (on EC2) and so this is not about "apt vs yum" or "Ubuntu vs CentOS" etc..

What took me a while to figure was that this was one of the oldest of issues between Mac/Linux and Windows OS.

Line-endings!

Running something as simple as the following, at the root of the source directory got the ball rolling:

find . -type f -print0 | xargs -0 dos2unix

Enjoy Linux on Windows :) !

18 Apr 2019

How about 1000 cascading Replicas :)

The other day, I remembered an old 9.0-era mail thread (when Streaming Replication had just launched) where someone had tried to daisy-chain Postgres Replicas and see how many (s)he could muster.

If I recall correctly, the OP could squeeze only ~120 or so, mostly because the Laptop memory gave way (and not really because of an engine limitation).

I couldn't find that post, but it was intriguing to know if we could reach (at least) a thousand mark and see what kind of "Replica Lag" would that entail; thus NReplicas.

On a (very) unscientific test, my 4-Core 16G machine can spin-up (create data folders and host processes for all) 1000 Replicas in ~8m (and tear them down in another ~2m). Now am sure this could get better, but amn't complaining since this was a breeze to setup (in that it just worked without much tinkering ... besides lowering shared_buffers).

For those interested, a single UPDATE on the master, could (nearly consistently) be seen on the last Replica in less than half a second, with top showing 65% CPU idle (and 2.5 on the 1-min CPU metric) during a ~30 minute test.

Put in simple terms, what this means is that the UPDATE change traveled from the Master to a Replica (lets call it Replica1) and then from Replica1 it cascaded the change on to Replica2 (and so on a 1000 times). The said row change can be seen on Replica1000 within half a second.

So although (I hope) this isn't a real-world use-case, I still am impressed that this is right out-of-the-box and still way under the 1 second mark.... certainly worthy of a small post :) !

Host: 16GB / 4 core

Time to spin up (1000k Cascading Replicas): 8minutes

Time to tear down: 2 minutes

Test type: Constant UPDATEs (AV settings default)

Test Duration: 30min

Time for UPDATE to propagate: 500 ms!! (on average)

CPU Utilization: ~65%

CPU 1-min ratio: 2.5

17 Jan 2019

How to add remove DEFAULT PRIVILEGES

At times we need to do a DROP USER and are unable to do so, because of existing "DEFAULT PRIVILEGES" that exist associated to the user, which disallow a DROP USER to go ahead.

Common SQL such as this do not give DEFAULT PRIVILEGES.

You can find DEFAULT PRIVILEGES by using \ddp in psql. If you haven't heard of psql, probably that'd be a good place to start.

Once you have the privileges, you need to understand how permissions are assigned, some detailing on the cryptic letters and their meanings is given here.

Once you have that, you need to essentially revert the GRANTs (using REVOKE command) and remove those default privileges one by one.

A sample is given below:

pg postgres2@t3=> create group dbuser;
CREATE ROLE
pg postgres2@t3=> alter group dbuser add user jacob;
ALTER ROLE
pg postgres2@t3=> alter group dbuser add user postgres2;
ALTER ROLE
pg postgres2@t3=> alter default privileges for user dbuser grant select on tables to jacob;
ALTER DEFAULT PRIVILEGES
pg postgres2@t3=> \ddp
Default access privileges
┌────────┬────────┬───────┬───────────────────────┐
│ Owner │ Schema │ Type │ Access privileges │
├────────┼────────┼───────┼───────────────────────┤
│ dbuser │ │ table │ jacob=r/dbuser │
│ │ │ │ dbuser=arwdDxt/dbuser │
└────────┴────────┴───────┴───────────────────────┘
(1 row)

pg postgres2@t3=> alter default privileges for user dbuser grant select on sequences to jacob;
ALTER DEFAULT PRIVILEGES
pg postgres2@t3=> alter default privileges for user dbuser grant usage on sequences to jacob;
ALTER DEFAULT PRIVILEGES
pg postgres2@t3=> \ddp
Default access privileges
┌────────┬────────┬──────────┬───────────────────────┐
│ Owner │ Schema │ Type │ Access privileges │
├────────┼────────┼──────────┼───────────────────────┤
│ dbuser │ │ sequence │ jacob=rU/dbuser │
│ │ │ │ dbuser=rwU/dbuser │
│ dbuser │ │ table │ jacob=r/dbuser │
│ │ │ │ dbuser=arwdDxt/dbuser │
└────────┴────────┴──────────┴───────────────────────┘
(2 rows)

pg postgres2@t3=> drop user jacob;
ERROR: 2BP01: role "jacob" cannot be dropped because some objects depend on it
DETAIL: privileges for default privileges on new sequences belonging to role dbuser
privileges for default privileges on new relations belonging to role dbuser
LOCATION: DropRole, user.c:1045
pg postgres2@t3=> ALTER DEFAULT PRIVILEGES FOR USER dbuser REVOKE SELECT ON tables FROM jacob;
ALTER DEFAULT PRIVILEGES
pg postgres2@t3=> ALTER DEFAULT PRIVILEGES FOR USER dbuser REVOKE SELECT ON sequences FROM jacob;
ALTER DEFAULT PRIVILEGES
pg postgres2@t3=> ALTER DEFAULT PRIVILEGES FOR USER dbuser REVOKE USAGE ON sequences FROM jacob;
ALTER DEFAULT PRIVILEGES
pg postgres2@t3=> \du
List of roles
┌─────────────────┬────────────────────────────────────────────────────────────┬─────────────────────────────────────┐
│ Role name │ Attributes │ Member of │
├─────────────────┼────────────────────────────────────────────────────────────┼─────────────────────────────────────┤
│ dbuser │ Cannot login │ {} │
│ jacob │ │ {dbuser} │
│ postgres2 │ Create role, Create DB │ {rds_superuser,dbuser} │
│ rds_superuser │ Cannot login │ {rds_replication} │
└─────────────────┴────────────────────────────────────────────────────────────┴─────────────────────────────────────┘

pg postgres2@t3=> \ddp
Default access privileges
┌───────┬────────┬──────┬───────────────────┐
│ Owner │ Schema │ Type │ Access privileges │
├───────┼────────┼──────┼───────────────────┤
└───────┴────────┴──────┴───────────────────┘
(0 rows)

pg postgres2@t3=> drop user jacob;
DROP ROLE

This method should allow you to remove all DEFAULT PRIVILEGEs (only) for this User.

Note: Importantly, you'd need to repeat the above step for *each* database!

20 Nov 2017

Update: RDS Prewarm script updated to fetch FSM / VM chunks

(This post is in continuation to my previous post regarding Initializing RDS Postgres Instance)

This simple SQL "Initializes" the EBS volume linked to an RDS Instance, something which isn't possible to do without sending workload (and experience high Latency in the first run).

Key scenarios, where this is really helpful are:

Create a Read-Replica (or Hot Standby in Postgres terms)
Restore a new RDS Instance from a Snapshot

Update: The Script, now also does the following:

Now also fetches disk blocks related to FSM / VM of all tables
Now fetches all Indexes

Limitations that still exist:

~~TOAST tables are still directly inaccessible in RDS~~

~~Indexes for TOAST columns also fall under this category~~
~~Trying hard to see if this last hurdle can be worked around~~

~~Anyone with any ideas?!~~

Script needs to be run once per Database Owner

Not sure if there is any magic around this

Object ownership is a Postgres property

RDS Postgres does not give Superuser access

I'll try to ease this in the future

By creating a script to list the Users that this needs to run as
The other possibility is to use DBLink to run this for separate Users in a single run

I'll update here, in case I make any significant changes.

Sample Run

-[ RECORD 1 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:40:08.291891-05

table_size | 13 GB

freespace_map_size | 3240 kB

visibility_map_size | 408 kB

blocks_prefetched | 1639801

current_database | pgbench

schema_name | public

table_name | pgbench_accounts

-[ RECORD 2 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:43:37.703711-05

table_size | 2142 MB

freespace_map_size | 0 bytes

visibility_map_size | 0 bytes

blocks_prefetched | 274194

current_database | pgbench

schema_name | public

table_name | pgbench_accounts_pkey

-[ RECORD 3 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.899115-05

table_size | 440 kB

freespace_map_size | 24 kB

visibility_map_size | 8192 bytes

blocks_prefetched | 59

current_database | pgbench

schema_name | public

table_name | pgbench_tellers

-[ RECORD 4 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.901088-05

table_size | 240 kB

freespace_map_size | 0 bytes

visibility_map_size | 0 bytes

blocks_prefetched | 30

current_database | pgbench

schema_name | public

table_name | pgbench_tellers_pkey

-[ RECORD 5 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.905107-05

table_size | 40 kB

freespace_map_size | 0 bytes

visibility_map_size | 0 bytes

blocks_prefetched | 5

current_database | pgbench

schema_name | public

table_name | pgbench_branches_pkey

-[ RECORD 6 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.907089-05

table_size | 40 kB

freespace_map_size | 24 kB

visibility_map_size | 8192 bytes

blocks_prefetched | 9

current_database | pgbench

schema_name | public

table_name | pgbench_branches

-[ RECORD 7 ]-------+------------------------------

clock_timestamp | 2017-11-19 15:44:12.907142-05

table_size | 0 bytes

freespace_map_size | 0 bytes

visibility_map_size | 0 bytes

blocks_prefetched | 0

current_database | pgbench

schema_name | public

table_name | pgbench_history

7 Nov 2017

Prewarming / Initializing an RDS Postgres instance (from S3)

UPDATE: Read this for recent updates. Now the SQL successfully fetches *all* disk blocks on most RDS PostgreSQL (read post for the rare exceptions).

As many of you know, that AWS RDS Postgres uses EBS which has an interesting feature called Lazy Loading that allows it to instantiate a disk (the size of which can be mostly anything from 10GB to 6TB) and it comes online within a matter of minutes. Although a fantastic feature, this however, can lead to unexpected outcomes when high-end production load is thrown at a newly launched RDS Postgres instance immediately after Restoring from a Snapshot.

One possible solution is to use the pg_prewarm Postgres Extension that is well supported in RDS Postgres, immediately after Restoring from a Snapshot, thereby reducing the side-effects of Lazy Loading.

Although pg_prewarm was originally meant for populating buffer-cache, this extension (in this specific use-case) is heaven-sent to initialize (fetch), (almost) the entire snapshot from S3 on to the RDS EBS volume in question. Therefore, even if you use pg_prewarm to run through all tables etc., thereby effectively evicting the recent run for the previous table from buffer-cache, it still does the job of initializing all disk-blocks with respect to the EBS volume.

I've just checked in the SQL to this repository that seems to do this magic pretty well. It also enlists why this would only take you ~70% of the way owing to restrictions / limitations (as per my current understanding).

In the Sample below, I restored a new RDS Postgres instance from a Snapshot and immediately thereafter ran this SQL on it.

Notice that the first table (pgbench_accounts) takes about 22 seconds to load the first time, and less than a second to load the second time.
Similarly the second table (pgbench_history) takes 15 seconds to load the first time and less than a second, the second time :) !

pgbench=> SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench-> relkind, c.relname
pgbench-> FROM pg_class c
pgbench-> JOIN pg_namespace n
pgbench-> ON n.oid = c.relnamespace
pgbench-> JOIN pg_user u
pgbench-> ON u.usesysid = c.relowner
pgbench-> WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench-> ORDER BY c.relpages DESC;
clock_timestamp | pg_prewarm | relkind | relname
-------------------------------+------------+---------+-----------------------
2017-11-07 11:41:44.341724+00 | 17903 | r | pgbench_accounts
2017-11-07 11:42:06.059177+00 | 6518 | r | pgbench_history
2017-11-07 11:42:17.126768+00 | 2745 | i | pgbench_accounts_pkey
2017-11-07 11:42:21.406054+00 | 45 | r | pgbench_tellers
2017-11-07 11:42:21.645859+00 | 24 | r | pgbench_branches
2017-11-07 11:42:21.757086+00 | 2 | i | pgbench_branches_pkey
2017-11-07 11:42:21.757653+00 | 2 | i | pgbench_tellers_pkey
(7 rows)

pgbench=>
pgbench=> SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench-> relkind, c.relname
pgbench-> FROM pg_class c
pgbench-> JOIN pg_namespace n
pgbench-> ON n.oid = c.relnamespace
pgbench-> JOIN pg_user u
pgbench-> ON u.usesysid = c.relowner
pgbench-> WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench-> ORDER BY c.relpages DESC;
clock_timestamp | pg_prewarm | relkind | relname
-------------------------------+------------+---------+-----------------------
2017-11-07 11:42:33.914195+00 | 17903 | r | pgbench_accounts
2017-11-07 11:42:33.917725+00 | 6518 | r | pgbench_history
2017-11-07 11:42:33.918919+00 | 2745 | i | pgbench_accounts_pkey
2017-11-07 11:42:33.919412+00 | 45 | r | pgbench_tellers
2017-11-07 11:42:33.919427+00 | 24 | r | pgbench_branches
2017-11-07 11:42:33.919438+00 | 2 | i | pgbench_branches_pkey
2017-11-07 11:42:33.919443+00 | 2 | i | pgbench_tellers_pkey
(7 rows)

14 Oct 2017

First alpha release of PsqlForks - Menon

Primer: PsqlForks aims to support all DB Engines that (even partially) speak Postgres (psqlforks = psql for Postgres forks).

Given that PsqlForks has been in development for a few weeks, it's time to stabilize a bit and towards that, we finally have Menon, PsqlForks first Alpha Release. Being an alpha, by definition it isn't ready for production, but it feels stable enough ... feel free to test it out!

Importantly, this fork is synced with postgres/master regularly, and should ideally sport all recent psql developments. Further, I am not a C expert and am just barely comprehending Postgres, so let me know of any 18-wheelers that I didn't see.

The release title - 'Menon', is a common sub-Caste in South-Indian state of Kerala. Selecting this nomenclature emanates from the idea of popularizing (heh!) common names and places from Kerala... and that it doesn't hurt to have an identifiable name (and while at it, add character) to a Release :)

This release includes:

Decent support for Redshift:

SQL tab completion for Redshift related variations
\d etc. now support Redshift specifics - ENCODINGs / SORTKEYs / DISTKEY / COMPRESSION etc.
Support Temporary Credentials using IAM Authentication (via AWS CLI)
View detailed progress here.

Basic support / Recognition semantics for:

CockroachDB - view progress here
PipelineDB
PgBouncer
RDS PostgreSQL

You could read more here:

For the interested:

Upcoming Milestones: https://github.com/robins/postgres/milestones
Existing open issues: https://github.com/robins/postgres/issues
Feedback / Bugs? Post them here