Showing posts with label psql. Show all posts
Showing posts with label psql. Show all posts

27 Oct 2024

What's in an empty table?


How much storage does an empty table in Postgres take?

This is a post about Postgres tables that store ... well basically ... Nothing.


The idea for this post came from this tweet that hinted that an empty table on most databases today takes 16Kb of storage. Now admittedly Franck was probably reminiscing the good-old days so this is probably quite out of context, but it did get me thinking, and thus this post.


NB: Here's a video showing this in action ! - See video .

A "regular" empty table in Production

Here's a regular small table that could be found in Production. It has a Primary Key, a text column and a JSONB column. Let's check the table size using the pg_total_relation_size() postgres function (you can read more about that function here).

db1=# create table t(id bigint primary key, b text, c jsonb);

CREATE TABLE


db1=# select pg_total_relation_size('t');

 pg_total_relation_size

------------------------

                  16384

(1 row)



Hmmm, so that tweet did have a point. Given how low-cost memory has become over the decades, it is easy to understand why databases today chose to optimize speed over memory efficiency (more on this later) and so even an empty table in Postgres, does consume 16 kb. 

But "where" is the 16kb being used?


db1=# select pg_relation_size('t');
 pg_relation_size
------------------
                0
(1 row)

The relation itself isn't consuming any space! That is good (again more on this later) but then where is the space being used then?

db1=# \d t
                 Table "public.t"
 Column |  Type  | Collation | Nullable | Default
--------+--------+-----------+----------+---------
 id     | bigint |           | not null |
 b      | text   |           |          |
 c      | jsonb  |           |          |
Indexes:
    "t_pkey" PRIMARY KEY, btree (id)


We see that the table has a Primary Key - and thus an index - t_pkey

Does the index is consuming 16kb?

db1=# select pg_relation_size('t_pkey');
 pg_relation_size
------------------
             8192
(1 row)

So the index is using some of it - 8kb - so that is progress - but who's using the other 8kb?


Let's start cutting the table down

Let's start cutting down the columns and see if the disk-usage goes down. 

db1=# DROP TABLE t; CREATE TABLE t (id BIGINT PRIMARY KEY, b JSONB); select pg_total_relation_size('t');
DROP TABLE
CREATE TABLE
 pg_total_relation_size
------------------------
                  16384
(1 row)

db1=# DROP TABLE t; CREATE TABLE t (id BIGINT PRIMARY KEY, b TEXT); select pg_total_relation_size('t');
DROP TABLE
CREATE TABLE
 pg_total_relation_size
------------------------
                  16384
(1 row)


Hmmm, that didn't help at all. Dropping either of the TEXT or JSONB column didn't help. Let's look at the expanded version of this table to see if there's any similarity in the two columns. (I've clipped the output to make it easier to read)


db1=# \d+ t
                                            Table "public.t"
 Column |  Type  | Collation | Nullable | Default | Storage  | ...
--------+--------+-----------+----------+---------+----------+-...
 id     | bigint |           | not null |         | plain    | ...
 b      | text   |           |          |         | extended | ...
 c      | jsonb  |           |          |         | extended | ...
Indexes:
    "t_pkey" PRIMARY KEY, btree (id)
Access method: heap


Clearly, the two column's are "extended" (you can read more about it here), but basically what happened here is that an extended column type resulted in the creation of a TOAST table (you can read more about TOAST here). Let's find out "how" to find the TOAST table for the table "t", and check if that could be consuming the remaining 8kb?


db1=# select oid, relname, reltoastrelid, reltoastrelid::regclass from pg_class where relname = 't';
  oid  | relname | reltoastrelid |      reltoastrelid
-------+---------+---------------+-------------------------
 18300 | t       |         18303 | pg_toast.pg_toast_18300
(1 row)

db1=# select pg_relation_size('pg_toast.pg_toast_18300');
 pg_relation_size
------------------
                0
(1 row)

Hmmm, it's not the TOAST table. But just like the main table "t", could it be that the TOAST table has supporting relations that are to blame?


db1=# select pg_total_relation_size('pg_toast.pg_toast_18300');
 pg_total_relation_size
------------------------
                   8192
(1 row)

db1=# \d pg_toast.pg_toast_18300
TOAST table "pg_toast.pg_toast_18300"
   Column   |  Type
------------+---------
 chunk_id   | oid
 chunk_seq  | integer
 chunk_data | bytea
Owning table: "public.t"
Indexes:
    "pg_toast_18300_index" PRIMARY KEY, btree (chunk_id, chunk_seq)

db1=# select pg_relation_size('pg_toast.pg_toast_18300_index');
 pg_relation_size
------------------
             8192
(1 row)

Yes! So we see above, that the TOAST table implicitly has a primary key (of it's own) that uses an index - which has 1 page (8kb) assigned to it.

In a nutshell, the empty table 't' above, consumes 16kb and the storage allocation takes this shape:

  • Main relation - 0 bytes - 't'
  • Main relation  Index - 8kb - 't_pkey'
  • Toast relation - 0 bytes - 'pg_toast_18300'
  • Toast relation Index - 8kb - 'pg_toast_18300_index'

Cut Cut Cut

Okay, lets see if we can reduce the table size further, by dropping both the Extended columns.


db1=# DROP TABLE t; CREATE TABLE t (id BIGINT PRIMARY KEY); select pg_total_relation_size('t');
DROP TABLE
CREATE TABLE
 pg_total_relation_size
------------------------
                   8192
(1 row)


Okay, that makes sense. Now the main table (heap) is still not consuming anything, but the index still does. 

Let's reduce further.

db1=# DROP TABLE t; CREATE TABLE t (id BIGINT); select pg_total_relation_size('t');
DROP TABLE
CREATE TABLE
 pg_total_relation_size
------------------------

                      0
(1 row)





0 bytes!! 


Nice! But seriously - 0 bytes?

It kind of makes sense, that since Postgres doesn't yet have anything to store - besides the metadata of the table (and since the metadata is stored in the system catalogs - for e.g. pg_catalog schema) - there isn't anything to store in the main relation (heap) as yet. 

Disk Usage - Check filesystem

Hmmm - Nah, what if I don't trust Postgres?

Let's skip Postgres functions and ask the filesystem directly - and see if the table is actually 0 bytes. Here we first find the file path of the table in question using the postgres function pg_relation_filepath() and then ask the file-system for the file-size.


db1=# select pg_relation_filepath('t');
 pg_relation_filepath
----------------------
 base/17727/18356
(1 row)

db1=# \! ls -la /home/robins/proj/localpg/data/base/17727/18356

-rw------- 1 robins robins 0 Sep 23 10:19 /home/robins/proj/localpg/data/base/17727/18356


So the file corresponding to the table, actually is using 0 bytes. Nice!

Now, when a table is created, some entries are added to the system catalog. Let see if the database size is signficantly more than a blank database?


postgres=# create database db1;
CREATE DATABASE

postgres=# \c db1
You are now connected to database "db1" as user "robins".

db1=# CREATE TABLE t (id BIGINT);
CREATE TABLE

db1=# select pg_database_size('db1');
 pg_database_size
------------------
          7482515
(1 row)

db1=# create database db2;
CREATE DATABASE

db1=# \c db2
You are now connected to database "db2" as user "robins".

db2=# select pg_database_size('db2');
 pg_database_size
------------------
          7482515
(1 row)


Good. So this somewhat confirms that a blank database and a database with an "empty" table use the same disk-space. 

Postgres is hiding something

Technically though, I'm lying. Well actually Postgres is lying hiding something from the file-system (i.e. every new table does make the database grow logically - a tad little - just that most of the times, the filesystem doesn't get the memo).

Under the cover, the way postgres stores data in a table (catalog table, or any user table), although it consumes a page of disk-space (often 8Kb), logically it may be consuming only a small part of that page. This is very helpful when more rows need to be stored in the table. When more rows come in, Postgres is able to reuse the same (first) page to now logically store more data - although for the filesystem - no extra pages were requested. This is what I was hinting at earlier that today's database use disk space (and thus memory cache) with larger (8kb) chunks and continue to keep using that page (until a new page is needed). Further below, I show a brief example of how all of this works.

But to summarize, it is then unfair to say that the system catalog did not grow at all (when a new table was added) - since some catalogs are guaranteed to have grown (for e.g. metadata of the new table is stored as an extra row in pg_class etc.) within the disk blocks already allocated as in-use for that catalog.

Let's squeeze a little more? 


db1=# create table a();

CREATE TABLE


Right off the bat, that might seem completely wrong. Does Postgres allow a table with no Columns?

Yes !! 😎

All databases allow creation of an empty table (obviously), but Postgres allows a new table even if there are no columns! Let's verify this from the Postgres Documentation. Although suttle, we can see that in the syntax section of the CREATE TABLE page, the column_name data_type is enclosed with a square braces [] - which implies that columns are in fact, optional. What's more, this "feature" is a part of Postgres at least for the past 20 years!

Now the utility of this table is arguable (we'll explore that below), but it is now clear that this syntax is legal and works just fine. 

Let's see how such a table looks like with psql \d 

db1=# \d a
                Table "public.a"
 Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------


That's it! That's the complete output - Since the table has no columns, the output above (rightly) doesn't show anything.


Let's go a little deeper

The obvious next question is - What on earth could a table like this be used for? That is a perfectly good question, and the answer is probably not much. However if you instead ask whether "Squeeze a little more" mean that a no column table takes less storage space? The answer (depends on input data but) is most probably yes. Let's see how does it help Postgres if it knows that you don't want to store any column in the table.


db1=# create table a();
CREATE TABLE
db1=# \dt+ a
                                  List of relations
 Schema | Name | Type  | Owner  | Persistence | Access method |  Size   | Description
--------+------+-------+--------+-------------+---------------+---------+-------------
 public | a    | table | robins | permanent   | heap          | 0 bytes |
(1 row)

db1=# insert into a select;
INSERT 0 1

db1=# \dt+ a
                                    List of relations
 Schema | Name | Type  | Owner  | Persistence | Access method |    Size    | Description
--------+------+-------+--------+-------------+---------------+------------+-------------
 public | a    | table | robins | permanent   | heap          | 8192 bytes |
(1 row)


Nothing new here. We see that although an empty (0 column) table consumes 0 byte for storage. And like a regular table, as soon as the first row is inserted, the table uses 1 page - which in my test database (and probably 99.99% of postgres databases world-wide) consumes 8192 bytes. This is expected, but do note that the storage of logical rows in a postgres page, is a little oddly done (and for good reason). There is lllllooooottttt of detail here - but I wouldn't blame you if you'd want to keep that aside for a cold winter morning - when armed with a cup of hot coffee.

For now, we see below that each row that is inserted into the table, consumes 24 bytes - in that 8kb page.

db1=# create extension pageinspect ;
CREATE EXTENSION

db1=# truncate table a;
TRUNCATE TABLE

db1=# insert into a select FROM generate_series(1,3);
INSERT 0 3

db1=# SELECT * FROM heap_page_items(get_raw_page('a', 0));
 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits
| t_oid | t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------
+-------+--------
  1 |   8168 |        1 |     24 |  85105 |      0 |        0 | (0,1)  |           0 |       2048 |     24 |
|       | \x
  2 |   8144 |        1 |     24 |  85105 |      0 |        0 | (0,2)  |           0 |       2048 |     24 |
|       | \x
  3 |   8120 |        1 |     24 |  85105 |      0 |        0 | (0,3)  |           0 |       2048 |     24 |
|       | \x
(3 rows)



Still going Deeper - but on a Tangent

So does the above imply that adding more columns to a table would mean Postgres consumes more bytres-per-row? Let's verify:

db1=# drop table t;
DROP TABLE

db1=# create table t(id bigint);
CREATE TABLE

db1=# truncate table t; insert into t select generate_series(1,3); vacuum full t; \dt+ t
TRUNCATE TABLE
INSERT 0 3
VACUUM
                                    List of relations
 Schema | Name | Type  | Owner  | Persistence | Access method |    Size    | Description
--------+------+-------+--------+-------------+---------------+------------+-------------
 public | t    | table | robins | permanent   | heap          | 8192 bytes |
(1 row)

db1=# SELECT * FROM heap_page_items(get_raw_page('t', 0));
 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits
| t_oid |       t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------
+-------+--------------------
  1 |   8160 |        1 |     32 |  85100 |      0 |        0 | (0,1)  |           1 |       2816 |     24 |
|       | \x0100000000000000
  2 |   8128 |        1 |     32 |  85100 |      0 |        0 | (0,2)  |           1 |       2816 |     24 |
|       | \x0200000000000000
  3 |   8096 |        1 |     32 |  85100 |      0 |        0 | (0,3)  |           1 |       2816 |     24 |
|       | \x0300000000000000
(3 rows)


Here we see that each row is now consuming 32 bytes - which is an extra 8 bytes from earlier. Good chances the only column we've added is the reason for the extra 8 bytes. Let's verify that using the pg_column_size() function (you can read more about it here):


db1=# select pg_column_size(1::bigint);
 pg_column_size
----------------
              8
(1 row)


But wait, there's one more twist here:

db1=# INSERT INTO t SELECT;
INSERT 0 1

db1=# SELECT * FROM heap_page_items(get_raw_page('t', 0));
 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff |  t_bits
  | t_oid |       t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------
--+-------+--------------------
  1 |   8160 |        1 |     32 |  85111 |      0 |        0 | (0,1)  |           1 |       2816 |     24 |
  |       | \x0100000000000000
  2 |   8128 |        1 |     32 |  85111 |      0 |        0 | (0,2)  |           1 |       2816 |     24 |
  |       | \x0200000000000000
  3 |   8096 |        1 |     32 |  85111 |      0 |        0 | (0,3)  |           1 |       2816 |     24 |
  |       | \x0300000000000000
  4 |   8072 |        1 |     24 |  85113 |      0 |        0 | (0,4)  |           1 |       2049 |     24 | 0000000
0 |       | \x
(4 rows)

So wait, see row 4. Although the table has a column, just because the column didn't have a value, the row actually consumed only 24 bytes (the minimum)?

Is this scalable? I mean can I have a 5 column table and still Postgres stores a row, but only consume the bare minimum 24 bytes? Let's see:


db1=# drop table h;
DROP TABLE

db1=# create table h(c1 bigint, c2 bigint, c3 bigint, c4 bigint, c5 bigint);
CREATE TABLE

db1=# insert into h select;
INSERT 0 1

db1=# SELECT * FROM heap_page_items(get_raw_page('h', 0));
 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff |  t_bits
  | t_oid | t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------
--+-------+--------
  1 |   8168 |        1 |     24 |  86262 |      0 |        0 | (0,1)  |           5 |       2049 |     24 | 0000000
0 |       | \x
(1 row)


So yes, that does work and it does scale for "many" columns - but with a minor variation. There's more detail in code, but basically for regular columns the header contains 1 bit per column which expands in 8 byte chunks - and so say for 100 column table (with no data) - Postgres consumes ~40 bytes per row. 


db1=# drop table h; create table h(); select 'alter table h add column c' || n || ' bigint;' from generate_series(1,100) e(n); \gexec
.
.
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE

db1=# select count(*) from pg_attribute where attrelid = 'h'::regclass;
 count
-------
   106
(1 row)

db1=# insert into h select; vacuum full h; SELECT * FROM heap_page_items(get_raw_page('h', 0));
INSERT 0 1
VACUUM
 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff |
                                          t_bits                                                  | t_oid | t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------
--------------------------------------------------------------------------------------------------+-------+--------
  1 |   8152 |        1 |     40 |  88376 |      0 |        0 | (0,1)  |         100 |       2817 |     40 | 0000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 |       | \x
(1 row)

Given the task at hand, I'd say that's still decently crisp.

So does a zero column table, squeeze max rows per page?


Yes, and No. So going back to a 0 column table - let's try to fill the whole page with rows and see how many can be stuffed on a single page. 

If you're running low on coffee - the page-header is 24 bytes and so back-of-the-envelope math suggests that number of rows possible to squeeze into a page should be - ( 8192 bytes in a page - some bytes for page header & footer ) / 24 bytes per row = ~340 rows.


db1=# truncate table a; insert into a select FROM generate_series(1,340); vacuum full a; \dt+ a
TRUNCATE TABLE
INSERT 0 340
VACUUM
                                 List of relations
 Schema | Name | Type  | Owner  | Persistence | Access method | Size  | Description
--------+------+-------+--------+-------------+---------------+-------+-------------
 public | a    | table | robins | permanent   | heap          | 16 kB |
(1 row)

Something's not right! That size should have stayed 8Kb. Why did Postgres use 16Kb (two pages - instead of one)?

That's because of this tiny bit of trivia that the maximum number of tuples that Postgres can squeeze onto a page, is hard-coded to 291 (for an 8kb page) - which was interesting to know - but to clarify, Heap-Only-Tuples (HOT feature) can effectively force a few more rows on a running database, but we'll go deeper into that some other day.

So let's go back and confirm if that understanding is correct - that Postgres can in fact squeeze at max only 291 rows onto the same page. 


db1=# truncate table a; insert into a select FROM generate_series(1,291); vacuum full a; \dt+ a
TRUNCATE TABLE
INSERT 0 291
VACUUM
                                    List of relations
 Schema | Name | Type  | Owner  | Persistence | Access method |    Size    | Description
--------+------+-------+--------+-------------+---------------+------------+-------------
 public | a    | table | robins | permanent   | heap          | 8192 bytes |
(1 row)


db1=# truncate table a; insert into a select FROM generate_series(1,292); vacuum full a; \dt+ a
TRUNCATE TABLE
INSERT 0 292
VACUUM
                                 List of relations
 Schema | Name | Type  | Owner  | Persistence | Access method | Size  | Description
--------+------+-------+--------+-------------+---------------+-------+-------------
 public | a    | table | robins | permanent   | heap          | 16 kB |
(1 row)

Here we see that:
  • When 291 rows are inserted (blue), the table stays at 1 page (8kb)
  • Whereas when 292 (291+1) rows are inserted (green), the table expands to 2 pages (16 kb)

Utility

So, that's all fine, but what's this table good for?

Well beyond understanding Postgres :) not much. This table is unhelpful for most database tasks. It can't store values - it wouldn't allow columns / indexes / selective deletes (let's avoid ctid hacks for now) etc. 

But if I was forced to conjure an idea, a high-contention ticker app (that only needs to store +1s) via postgres functions - may be (and that's a BIG may be) this table could be used to store +1s - with a back-off algorithm on the application side. Good chances if there's a good DBA - this is done much better in many other ways (simplest of which is at the application end, or as a value in a column etc.) but it'd be better than nothing.

There are few other possible use-cases discussed here - for e.g. if (for some reason) you'd want to add columns to a table in a programmatic fashion after a base table is created - like we did above in the 100 column table test, OR, if you want to reserve a table name (in a multi-user setup) months / years in advance etc.

Finally

Much Ado About Nothing... This was a good exercise where we learnt something new about how Postgres stores table data - when ironically - there's nothing to store :) 

Hope you had fun! Comments more than welcome.

23 Apr 2024

Installing pg_tle on Ubuntu: A Quick Guide

Compile & Install pg_tle on Postgres

PostgreSQL is a powerful database, but sometimes you want to extend its functionality with custom features, and that's where extensions like pg_tle (Trusted Language Extensions) come into play.

If you're new to pg_tle, here's a quick round-up of why it makes sense for you - See Unlock PostgreSQL Superpowers with pg_tle.

Given the power of pg_tle, you may want to install it locally (on your laptop or an EC2 instance) before deploying to environments with restricted access (such as Production, or PostgreSQL services offered by major cloud providers). This is not only helpful to thoroughly test your code, but also to save on cost given that all development can then happen on-premise.

In this blog post, we'll go through the process of compiling and installing pg_tle for postgres on your Ubuntu system.

Prerequisites

An operating system running Ubuntu (this guide assumes Ubuntu 20.04 or similar).

  • A PostgreSQL database server, installed and running.
  • Basic familiarity with the command line and postgresql.conf.
  • Some development tools (we'll install these as we go).

Steps

Install Build Tools and Dependencies:

Start by updating your Ubuntu package list & install the necessary tools and libraries (This is required, since we would be compiling the pg_tle extension by source).

$ sudo apt update
$ sudo apt install build-essential make git postgresql-server-dev-all 

The postgresql-server-dev-all package version may need to be adjusted to match your specific PostgreSQL version. If this doesn't work for you, you can instead read more about setting up your Ubuntu operating system (albeit in a dated post) here - See Setup Linux for PostgreSQL development [3].

Download pg_tle Source Code:

Get the pg_tle source code from the GitHub project repository:

$ git clone https://github.com/aws/pg_tle.git

Compile pg_tle:

Compile the source code to create the extension files:

$ cd pg_tle
$ make

Although rare, make may fail if it can't find pg_config. As in the example below, a quick hack could be to help by pointing make to the pg_config binary location:

$ make                                     <<===== Fails
Makefile:24: /usr/lib/postgresql/15/lib/pgxs/src/makefiles/pgxs.mk: No such file or directory
make: *** No rule to make target '/usr/lib/postgresql/15/lib/pgxs/src/makefiles/pgxs.mk'.  Stop.

$ PG_CONFIG="`type -P -a pg_config`" make  <<===== Works successfully
gcc -Wall -Wmissing-prototypes ...
.
.
. (compiling starts successfully)

Install pg_tle:

Install the compiled extension into your PostgreSQL database. This command would install the extension related files to the postgres binaries folder, pointed to by PG_CONFIG:

$ sudo make install

Enable pg_tle in Your Database:

Connect to your PostgreSQL database using your preferred tool (e.g., psql) and run the following SQL command:

test_pgtle=# CREATE EXTENSION pg_tle;
CREATE EXTENSION

Verification:

To confirm pg_tle is installed correctly, run this SQL query:

test_pgtle=# SELECT * FROM pg_available_extensions WHERE name = 'pg_tle';
-[ RECORD 1 ]-----+-------------------------------------------
name              | pg_tle
default_version   | 1.4.0
installed_version | 1.4.0
comment           | Trusted Language Extensions for PostgreSQL

You should see a result similar to the above, where installed_version confirms the pg_tle version that's installed successfully.

Conclusion

You've now successfully compiled and installed the pg_tle extension on your Ubuntu system! This opens up the possibility to create and deploy custom extensions to enhance your PostgreSQL database.

pg_tle is a powerful tool that allows you to develop more advanced extensions. You can find more information and examples in the official pg_tle documentation at https://github.com/aws/pg_tle.

If you're intrigued, keep an eye out for a follow-up post where I'll show a simple example of how to use pg_tle extension for a real-world need!

References

  1. Unlock PostgreSQL Superpowers with pg_tle - https://www.thatguyfromdelhi.com/2024/04/unlock-postgresql-superpowers-with-pgtle.html
  2. Setup Linux for PostgreSQL development - https://www.thatguyfromdelhi.com/2011/12/setup-ubuntu-for-postgresql-development.html

21 Dec 2021

Find Database DNS / Endpoint via SQL

How to get Database identifier using SQL

Often there is a need for client programs to find "where am I logged into?". This blog post provides a few ways to know more to answer this question. 


When more and more applications are breaking out of the regular (database + application) architecture, this gets more and more important in scenarios when you're not 100% sure "where" you just logged into.

Some common scenarios, where this is helpful are:


- Proxy between application and Database

- Load Balancer (application Load balancers & Database Load Balancers)

- Serverless Database Clusters

- Auto Scaling Instances (for e.g. EC2 based)

- To confirm whether the Database server quietly changed since last invocation (in the current session)


PSQL only - :HOST variable

When connected via PSQL, the HOST variable can tell which server you're currently connected to.  

postgres10@postgres=> SELECT :'HOST' db;

                            db

----------------------------------------------------------

 postgres10.xyz.ap-southeast-2.rds.amazonaws.com

(1 row)


Caveats:

  • This variable can be set during session, so probably it is a good idea to use it only in trusted environments.
  • This would need good testing with Proxy based architectures, since the value returned is what the client 'thinks' it connected to.
  • Also note that if the client was connected via IP address, this method doesn't do a reverse DNS lookup, so what you'd get is the IP address.


→ psql -U postgres10 -h 172.31.45.165
psql (15devel, server 10.13)
Type "help" for help.

postgres10@postgres=> SELECT :'HOST' db;
      db
---------------
 172.31.45.165
(1 row)

PSQL only - \conninfo

Similar to the method above, another psql only method is to get Connection Info via the \conninfo command. This is more descriptive (and so may be not very well suited for some needs) but additionally doesn't work very well with SELECTs.

postgres10@postgres=> \conninfo

You are connected to database "postgres" as user "postgres10" on host "postgres10.xyz.ap-southeast-2.rds.amazonaws.com" (address "1.2.3.4") at port "54321".



RDS Only - Lambda invocation

Another way to get the Server Endpoint is to use a Lambda function and let that return the Server endpoint. You can read more about how to use Lambda with RDS servers here.


Aurora Postgres only - aurora_replica_status() function

When connected to Aurora Postgres, one additional way for a client to know which of the servers it is connected to (in the cluster), it can use the Aurora function aurora_replica_status() that tells which server you're currently connected to. One additional benefit with this mechanism is that it is executed on the server, and thus returns a true picture of the servers present (and their role) in the cluster.


apg2@pg=> select server_id from aurora_replica_status();

server_id
-----------
apg2b
apg2a
(2 rows)


29 Dec 2020

Which SQL causes a Table Rewrite in Postgres?

EDIT: Updated to v17 (devel) - (Jan 2024).


While developing SQL based applications, it is commonplace to stumble on these 2 questions:

  1. What DDLs would block concurrent workload?
  2. Whether a DDL is going to rewrite the table (and in some cases may need ~ 2x disk space)?

Although completely answering Question 1 is beyond the scope of this post, one of the important pieces that helps answering both of these questions is whether a DDL is going to cause a relfilenode change..

For a brief background, each regular table in Postgres stores data in one or more files, each of which is referenced in the postgres catalog with a relfilenode. A simple way to check whether the current implementation is going to create / refer to another copy (file) is whether the relfilenode changes. (TRUNCATE is a standout here, which by design is going to purge the table data, so although the relfilenode would change here, in total it obviously wouldn't consume anywhere close to 2x disk-space)

The table below shows which DDLs would cause a table rewrite. As has been discussed here, we need some more info to completely answer Question 1, however meanwhile this table helps in making some concurrency / disk-usage related decisions for all Postgres versions supported today.



29 Feb 2020

Optimizations in GROUP BY vs SELECT DISTINCT

(This came out of something I was trying out + discussing with Postgres enthusiasts - thanks to all for clarifying doubts)

This article aims at highlighting one aspect of how the query planner implementation of SELECT * GROUP BY differs from SELECT DISTINCT.

For example:

SELECT b,c,d FROM a GROUP BY b,c,d;
vs
SELECT DISTINCT b,c,d FROM a;


We see a few scenarios where Postgres optimizes by removing unnecessary columns from the GROUP BY list (if a subset is already known to be Unique) and where Postgres could do even better. To highlight this difference, here I have an empty table with 3 columns:

postgres=# create table a (b integer, c text, d bigint);
CREATE TABLE


postgres=# \d a
                 Table "public.a"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 b      | integer |           |          |
 c      | text    |           |          |
 d      | bigint  |           |          |


On this table, we can see that SELECT * GROUP BY generates the exact same plan as SELECT DISTINCT. In particular, we're interested in the "Group Key" which is the same for both SQLs:

postgres=# explain select distinct b,c,d from a;
                         QUERY PLAN                        
------------------------------------------------------------
 HashAggregate  (cost=29.78..31.78 rows=200 width=44)
   Group Key: b, c, d
   ->  Seq Scan on a  (cost=0.00..21.30 rows=1130 width=44)
(3 rows)

postgres=# explain select b,c,d from a group by b,c,d;
                         QUERY PLAN                        
------------------------------------------------------------
 HashAggregate  (cost=29.78..31.78 rows=200 width=44)
   Group Key: b, c, d
   ->  Seq Scan on a  (cost=0.00..21.30 rows=1130 width=44)
(3 rows)



Having said that, if the same table is created with a PRIMARY KEY, we see that GROUP BY becomes smarter, in that we can see that the "Group Key" uses the Primary Key (here it is 'b') and correcty discards columns 'c' and 'd'. Nice 😄!

postgres=# create table a (b integer PRIMARY KEY, c text, d bigint);
CREATE TABLE
postgres=# explain select distinct b,c,d from a;
                         QUERY PLAN                        
------------------------------------------------------------
 HashAggregate  (cost=29.78..41.08 rows=1130 width=44)
   Group Key: b, c, d
   ->  Seq Scan on a  (cost=0.00..21.30 rows=1130 width=44)
(3 rows)

postgres=# explain select b,c,d from a group by b,c,d;
                         QUERY PLAN                        
------------------------------------------------------------
 HashAggregate  (cost=24.12..35.42 rows=1130 width=44)
   Group Key: b
   ->  Seq Scan on a  (cost=0.00..21.30 rows=1130 width=44)
(3 rows)


Let's check if we get the same optimization if we create a UNIQUE index on the column. The answer? Sadly No! Furthermore, I went ahead and created a NOT NULL constraint, but that didn't change anything either. (Do note that UNIQUE columns can have multiple rows with NULLs).

postgres=# create table a (b integer unique not null, c text, d bigint);
CREATE TABLE


postgres=# explain select b,c,d from a group by b,c,d;
                         QUERY PLAN                        
------------------------------------------------------------
 HashAggregate  (cost=29.78..41.08 rows=1130 width=44)
   Group Key: b, c, d
   ->  Seq Scan on a  (cost=0.00..21.30 rows=1130 width=44)
(3 rows)



Regarding the above, IIUC this is an obvious performance optimization that Postgres is still leaving on the table (as of v13+):

postgres=# select version();
                                                     version                                                     
------------------------------------------------------------------------------------------------------------------
 PostgreSQL 13devel on i686-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, 32-bit
(1 row)


Next, does it still optimize this, if the PRIMARY KEY is not the first column in the GROUP BY? Answer? Yes! (The engine can optimize if any of the GROUPed BY column is a Primary Key! Noice !


postgres=# create table a (b integer, c text primary key, d bigint);
CREATE TABLE


postgres=# explain select b,c,d from a group by b,c,d;
                         QUERY PLAN                        
------------------------------------------------------------
 HashAggregate  (cost=24.12..35.42 rows=1130 width=44)
   Group Key: c
   ->  Seq Scan on a  (cost=0.00..21.30 rows=1130 width=44)
(3 rows)


... and what if the PRIMARY KEY is a composite key of any of the columns in the GROUP BY column list? YES again 😄 !

postgres=# create table a (b int, c text, d bigint, primary key (c,d)) ;
CREATE TABLE

postgres=# explain select b,c,d from a group by b,c,d;
                         QUERY PLAN                        
------------------------------------------------------------
 HashAggregate  (cost=26.95..28.95 rows=200 width=44)
   Group Key: c, d
   ->  Seq Scan on a  (cost=0.00..21.30 rows=1130 width=44)
(3 rows)


Lastly, although some of these "optimizations" are things-to-avoid when writing good SQL, the reality is that ORM generated SQLs aren't that smart yet and then it's great that Postgres already implements these obvious optimizations.

What's in an empty table?

How much storage does an empty table in Postgres take? This is a post about Postgres tables that store ... well basically ...  Nothing . The...