(This came out of something I was trying out + discussing with Postgres enthusiasts - thanks to all for clarifying doubts)
This article aims at highlighting one aspect of how the query planner implementation of SELECT * GROUP BY differs from SELECT DISTINCT.
For example:
SELECT b,c,d FROM a GROUP BY b,c,d;
vs
SELECT DISTINCT b,c,d FROM a;
We see a few scenarios where Postgres optimizes by removing unnecessary columns from the GROUP BY list (if a subset is already known to be
Unique) and where Postgres could do even better. To highlight this difference, here I have an empty table with 3 columns:
postgres=# create table a (b integer, c text, d bigint);
CREATE TABLE
postgres=# \d a
Table "public.a"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
b | integer | | |
c | text | | |
d | bigint | | |
On this table, we can see that SELECT * GROUP BY generates the exact same plan as SELECT DISTINCT. In particular, we're interested in the "Group Key" which is the same for both SQLs:
postgres=# explain select distinct b,c,d from a;
QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=29.78..31.78 rows=200 width=44)
Group Key: b, c, d
-> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)
postgres=# explain select b,c,d from a group by b,c,d;
QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=29.78..31.78 rows=200 width=44)
Group Key: b, c, d
-> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)
Having said that, if the same table is created with a PRIMARY KEY, we see that GROUP BY becomes smarter, in that we can see that the "Group Key" uses the Primary Key (here it is 'b') and correcty discards columns 'c' and 'd'. Nice 😄!
postgres=# create table a (b integer PRIMARY KEY, c text, d bigint);
CREATE TABLE
postgres=# explain select distinct b,c,d from a;
QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=29.78..41.08 rows=1130 width=44)
Group Key: b, c, d
-> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)
postgres=# explain select b,c,d from a group by b,c,d;
QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=24.12..35.42 rows=1130 width=44)
Group Key: b
-> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)
Let's check if we get the same optimization if we create a UNIQUE index on the column. The answer? Sadly No! Furthermore, I went ahead and created a NOT NULL constraint, but that didn't change anything either. (Do note that UNIQUE columns can have multiple rows with NULLs).
postgres=# create table a (b integer unique not null, c text, d bigint);
CREATE TABLE
postgres=# explain select b,c,d from a group by b,c,d;
QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=29.78..41.08 rows=1130 width=44)
Group Key: b, c, d
-> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)
Regarding the above, IIUC this is an obvious performance optimization that Postgres is still leaving on the table (as of v13+):
postgres=# select version();
version
------------------------------------------------------------------------------------------------------------------
PostgreSQL 13devel on i686-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, 32-bit
(1 row)
Next, does it still optimize this, if the PRIMARY KEY is not the first column in the GROUP BY? Answer? Yes! (The engine can optimize if any of the GROUPed BY column is a Primary Key! Noice !
postgres=# create table a (b integer, c text primary key, d bigint);
CREATE TABLE
postgres=# explain select b,c,d from a group by b,c,d;
QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=24.12..35.42 rows=1130 width=44)
Group Key: c
-> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)
... and what if the PRIMARY KEY is a composite key of any of the columns in the GROUP BY column list? YES again 😄 !
postgres=# create table a (b int, c text, d bigint, primary key (c,d)) ;
CREATE TABLE
postgres=# explain select b,c,d from a group by b,c,d;
QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=26.95..28.95 rows=200 width=44)
Group Key: c, d
-> Seq Scan on a (cost=0.00..21.30 rows=1130 width=44)
(3 rows)
Lastly, although some of these "optimizations" are things-to-avoid when writing good SQL, the reality is that ORM generated SQLs aren't that smart yet and then it's great that Postgres already implements these obvious optimizations.
Showing posts with label optimization. Show all posts
Showing posts with label optimization. Show all posts
29 Feb 2020
20 Nov 2017
Update: RDS Prewarm script updated to fetch FSM / VM chunks
(This post is in continuation to my previous post regarding Initializing RDS Postgres Instance)
This simple SQL "Initializes" the EBS volume linked to an RDS Instance, something which isn't possible to do without sending workload (and experience high Latency in the first run).
Key scenarios, where this is really helpful are:
Update: The Script, now also does the following:
This simple SQL "Initializes" the EBS volume linked to an RDS Instance, something which isn't possible to do without sending workload (and experience high Latency in the first run).
Key scenarios, where this is really helpful are:
- Create a Read-Replica (or Hot Standby in Postgres terms)
- Restore a new RDS Instance from a Snapshot
Update: The Script, now also does the following:
Limitations that still exist:
TOAST tables are still directly inaccessible in RDSIndexes for TOAST columns also fall under this categoryTrying hard to see if this last hurdle can be worked aroundAnyone with any ideas?!- Script needs to be run once per Database Owner
- Not sure if there is any magic around this
- Object ownership is a Postgres property
- RDS Postgres does not give Superuser access
- I'll try to ease this in the future
- By creating a script to list the Users that this needs to run as
- The other possibility is to use DBLink to run this for separate Users in a single run
I'll update here, in case I make any significant changes.
Sample Run
-[ RECORD 1 ]-------+------------------------------
clock_timestamp | 2017-11-19 15:40:08.291891-05
table_size | 13 GB
freespace_map_size | 3240 kB
visibility_map_size | 408 kB
blocks_prefetched | 1639801
current_database | pgbench
schema_name | public
table_name | pgbench_accounts
-[ RECORD 2 ]-------+------------------------------
clock_timestamp | 2017-11-19 15:43:37.703711-05
table_size | 2142 MB
freespace_map_size | 0 bytes
visibility_map_size | 0 bytes
blocks_prefetched | 274194
current_database | pgbench
schema_name | public
table_name | pgbench_accounts_pkey
-[ RECORD 3 ]-------+------------------------------
clock_timestamp | 2017-11-19 15:44:12.899115-05
table_size | 440 kB
freespace_map_size | 24 kB
visibility_map_size | 8192 bytes
blocks_prefetched | 59
current_database | pgbench
schema_name | public
table_name | pgbench_tellers
-[ RECORD 4 ]-------+------------------------------
clock_timestamp | 2017-11-19 15:44:12.901088-05
table_size | 240 kB
freespace_map_size | 0 bytes
visibility_map_size | 0 bytes
blocks_prefetched | 30
current_database | pgbench
schema_name | public
table_name | pgbench_tellers_pkey
-[ RECORD 5 ]-------+------------------------------
clock_timestamp | 2017-11-19 15:44:12.905107-05
table_size | 40 kB
freespace_map_size | 0 bytes
visibility_map_size | 0 bytes
blocks_prefetched | 5
current_database | pgbench
schema_name | public
table_name | pgbench_branches_pkey
-[ RECORD 6 ]-------+------------------------------
clock_timestamp | 2017-11-19 15:44:12.907089-05
table_size | 40 kB
freespace_map_size | 24 kB
visibility_map_size | 8192 bytes
blocks_prefetched | 9
current_database | pgbench
schema_name | public
table_name | pgbench_branches
-[ RECORD 7 ]-------+------------------------------
clock_timestamp | 2017-11-19 15:44:12.907142-05
table_size | 0 bytes
freespace_map_size | 0 bytes
visibility_map_size | 0 bytes
blocks_prefetched | 0
current_database | pgbench
schema_name | public
table_name | pgbench_history
7 Nov 2017
Prewarming / Initializing an RDS Postgres instance (from S3)
UPDATE: Read this for recent updates. Now the SQL successfully fetches *all* disk blocks on most RDS PostgreSQL (read post for the rare exceptions).
As many of you know, that AWS RDS Postgres uses EBS which has an interesting feature called Lazy Loading that allows it to instantiate a disk (the size of which can be mostly anything from 10GB to 6TB) and it comes online within a matter of minutes. Although a fantastic feature, this however, can lead to unexpected outcomes when high-end production load is thrown at a newly launched RDS Postgres instance immediately after Restoring from a Snapshot.
One possible solution is to use the pg_prewarm Postgres Extension that is well supported in RDS Postgres, immediately after Restoring from a Snapshot, thereby reducing the side-effects of Lazy Loading.
Although pg_prewarm was originally meant for populating buffer-cache, this extension (in this specific use-case) is heaven-sent to initialize (fetch), (almost) the entire snapshot from S3 on to the RDS EBS volume in question. Therefore, even if you use pg_prewarm to run through all tables etc., thereby effectively evicting the recent run for the previous table from buffer-cache, it still does the job of initializing all disk-blocks with respect to the EBS volume.
I've just checked in the SQL to this repository that seems to do this magic pretty well. It also enlists why this would only take you ~70% of the way owing to restrictions / limitations (as per my current understanding).
In the Sample below, I restored a new RDS Postgres instance from a Snapshot and immediately thereafter ran this SQL on it.
pgbench=> SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench-> relkind, c.relname
pgbench-> FROM pg_class c
pgbench-> JOIN pg_namespace n
pgbench-> ON n.oid = c.relnamespace
pgbench-> JOIN pg_user u
pgbench-> ON u.usesysid = c.relowner
pgbench-> WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench-> ORDER BY c.relpages DESC;
clock_timestamp | pg_prewarm | relkind | relname
-------------------------------+------------+---------+-----------------------
2017-11-07 11:41:44.341724+00 | 17903 | r | pgbench_accounts
2017-11-07 11:42:06.059177+00 | 6518 | r | pgbench_history
2017-11-07 11:42:17.126768+00 | 2745 | i | pgbench_accounts_pkey
2017-11-07 11:42:21.406054+00 | 45 | r | pgbench_tellers
2017-11-07 11:42:21.645859+00 | 24 | r | pgbench_branches
2017-11-07 11:42:21.757086+00 | 2 | i | pgbench_branches_pkey
2017-11-07 11:42:21.757653+00 | 2 | i | pgbench_tellers_pkey
(7 rows)
pgbench=>
pgbench=> SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench-> relkind, c.relname
pgbench-> FROM pg_class c
pgbench-> JOIN pg_namespace n
pgbench-> ON n.oid = c.relnamespace
pgbench-> JOIN pg_user u
pgbench-> ON u.usesysid = c.relowner
pgbench-> WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench-> ORDER BY c.relpages DESC;
clock_timestamp | pg_prewarm | relkind | relname
-------------------------------+------------+---------+-----------------------
2017-11-07 11:42:33.914195+00 | 17903 | r | pgbench_accounts
2017-11-07 11:42:33.917725+00 | 6518 | r | pgbench_history
2017-11-07 11:42:33.918919+00 | 2745 | i | pgbench_accounts_pkey
2017-11-07 11:42:33.919412+00 | 45 | r | pgbench_tellers
2017-11-07 11:42:33.919427+00 | 24 | r | pgbench_branches
2017-11-07 11:42:33.919438+00 | 2 | i | pgbench_branches_pkey
2017-11-07 11:42:33.919443+00 | 2 | i | pgbench_tellers_pkey
(7 rows)
As many of you know, that AWS RDS Postgres uses EBS which has an interesting feature called Lazy Loading that allows it to instantiate a disk (the size of which can be mostly anything from 10GB to 6TB) and it comes online within a matter of minutes. Although a fantastic feature, this however, can lead to unexpected outcomes when high-end production load is thrown at a newly launched RDS Postgres instance immediately after Restoring from a Snapshot.
One possible solution is to use the pg_prewarm Postgres Extension that is well supported in RDS Postgres, immediately after Restoring from a Snapshot, thereby reducing the side-effects of Lazy Loading.
Although pg_prewarm was originally meant for populating buffer-cache, this extension (in this specific use-case) is heaven-sent to initialize (fetch), (almost) the entire snapshot from S3 on to the RDS EBS volume in question. Therefore, even if you use pg_prewarm to run through all tables etc., thereby effectively evicting the recent run for the previous table from buffer-cache, it still does the job of initializing all disk-blocks with respect to the EBS volume.
I've just checked in the SQL to this repository that seems to do this magic pretty well. It also enlists why this would only take you ~70% of the way owing to restrictions / limitations (as per my current understanding).
In the Sample below, I restored a new RDS Postgres instance from a Snapshot and immediately thereafter ran this SQL on it.
- Notice that the first table (pgbench_accounts) takes about 22 seconds to load the first time, and less than a second to load the second time.
- Similarly the second table (pgbench_history) takes 15 seconds to load the first time and less than a second, the second time :) !
pgbench=> SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench-> relkind, c.relname
pgbench-> FROM pg_class c
pgbench-> JOIN pg_namespace n
pgbench-> ON n.oid = c.relnamespace
pgbench-> JOIN pg_user u
pgbench-> ON u.usesysid = c.relowner
pgbench-> WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench-> ORDER BY c.relpages DESC;
clock_timestamp | pg_prewarm | relkind | relname
-------------------------------+------------+---------+-----------------------
2017-11-07 11:41:44.341724+00 | 17903 | r | pgbench_accounts
2017-11-07 11:42:06.059177+00 | 6518 | r | pgbench_history
2017-11-07 11:42:17.126768+00 | 2745 | i | pgbench_accounts_pkey
2017-11-07 11:42:21.406054+00 | 45 | r | pgbench_tellers
2017-11-07 11:42:21.645859+00 | 24 | r | pgbench_branches
2017-11-07 11:42:21.757086+00 | 2 | i | pgbench_branches_pkey
2017-11-07 11:42:21.757653+00 | 2 | i | pgbench_tellers_pkey
(7 rows)
pgbench=>
pgbench=> SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench-> relkind, c.relname
pgbench-> FROM pg_class c
pgbench-> JOIN pg_namespace n
pgbench-> ON n.oid = c.relnamespace
pgbench-> JOIN pg_user u
pgbench-> ON u.usesysid = c.relowner
pgbench-> WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench-> ORDER BY c.relpages DESC;
clock_timestamp | pg_prewarm | relkind | relname
-------------------------------+------------+---------+-----------------------
2017-11-07 11:42:33.914195+00 | 17903 | r | pgbench_accounts
2017-11-07 11:42:33.917725+00 | 6518 | r | pgbench_history
2017-11-07 11:42:33.918919+00 | 2745 | i | pgbench_accounts_pkey
2017-11-07 11:42:33.919412+00 | 45 | r | pgbench_tellers
2017-11-07 11:42:33.919427+00 | 24 | r | pgbench_branches
2017-11-07 11:42:33.919438+00 | 2 | i | pgbench_branches_pkey
2017-11-07 11:42:33.919443+00 | 2 | i | pgbench_tellers_pkey
(7 rows)
13 Apr 2016
Postgres Performance - New Connections
Last in the the PostgreSQL Performance series:
Please read more about Test Particulars / Chart Naming methodology from the previous post in the series.
Takeaway:
Please read more about Test Particulars / Chart Naming methodology from the previous post in the series.
Takeaway:
- New Connection performance has been constant (and at times have mildly deteriorated)
- i.e. Pgbench with -C
- I tried to use all possible combinations to find whether a corner case got better over the releases, but almost every test gave the same result.
- Possibly Tom and others don't consider this a practical use-case and therefore a non-priority.
- Unrelated, in the future, I'd club / write about such tests in a better fashion, rather than posting them as separate posts. I guess I got carried away by the results! Apologies about that.
Postgres Performance - Default Pgbench configurations
Continuing the PostgreSQL Performance series:
Please read more about Test Particulars / Chart Naming methodology from the previous post in the series.
Takeaway:
Please read more about Test Particulars / Chart Naming methodology from the previous post in the series.
Takeaway:
- Unrelated to the Read-Only performance mentioned earlier, which grew slowly over each Major release, these numbers have grown considerably specifically in 9.5
- 9.5 Branch is at times 35-70% faster than 9.4 Branch
- To reiterate, this test had no Pgbench flags enabled (no Prepared / no Read-Only / etc.) besides 4 Connections & 4 Threads.
Postgres Performance - Read Only
Continuing the PostgreSQL Performance series:
Please read more about Test Particulars / Chart Naming methodology from the previous post in the series.
Takeaway:
Please read more about Test Particulars / Chart Naming methodology from the previous post in the series.
Takeaway:
- Read-Only Performance numbers have consistently grown from (at least) 9.1 onwards
- 9.5 Branch is 35%-50% faster than 9.1 Branch
Postgres performance - File + ReadOnly
Just wanted to see how Postgres performed when comparing its performance over the different Major releases. I had a spare Pi2 lying around and found it useful for such a performance test.
More inferences to follow, in future posts.
As always, any feedback is more than welcome.
TLDR:
More inferences to follow, in future posts.
As always, any feedback is more than welcome.
TLDR:
- Despite a regression in 9.3 onwards, the combination of File + Read-Only has improved drastically in the 9.6dev branch
- 9.6dev branch is
- 2x faster than 9.1 on some tests
- 50% faster than 9.5.2 (currently, the latest stable release!) on some tests
- Yay!!
Hardware:
- Raspberry Pi Model 2B
- 1GB RAM
- 900 MHz Quad-Core-ARM Cortex-A7 (ARMv7 Processor rev 5 (v7l))
- 32GB Class 10 SD Card
Software: Source used for this test
Note:
- All configurations were run 10 times each
- All configurations were run for 100 secs each
- The Phrases in the name of each chart tells what Pgbench was run with:
- ConnX: with -cX (For e.g. Conn64: with -c64)
- Thread4: with -j4
- Prepared: with -M Prepared
- File: External SQL file with one SQL command "SELECT 1;"
- Readonly: with -S
- 100secs: with -T 100
5 Nov 2014
PostgreSQL Explain Plan-Nodes Grid
While reading up about the various PostgreSQL EXPLAIN Plan nodes from multiple sources, I realized a clear lack of a consolidated grid / cheat-sheet, from which I could (in-one-view) cross-check how the plan nodes perform on a comparative basis. While at it, I also tried to attribute what their (good / bad) characteristics are.
Hope this (Work-In-Progress) lookup table helps others on the same path. Any additions / updates are more than welcome.
Update:
The image below is an old-snapshot of the document. The updated web document is available here.
Subscribe to:
Posts (Atom)
What's in an empty table?
How much storage does an empty table in Postgres take? This is a post about Postgres tables that store ... well basically ... Nothing . The...

-
(Please scroll down to the read the entire-post to know more) Documents required for Christian Marriage Registration (a.k.a. Documents...
-
My patch (allowing a non-super-user to dump Globals) just got committed to Postgres 10.0. Besides the use mentioned above, this patch al...
-
Only Allow Login to Read-Replicas and Standbys When you're working with large databases in production, it is incredibly common to use re...