That Guy From Delhi: storage

Showing posts with label storage. Show all posts

27 Oct 2024

What's in an empty table?

How much storage does an empty table in Postgres take?

This is a post about Postgres tables that store ... well basically ... Nothing.

The idea for this post came from this tweet that hinted that an empty table on most databases today takes 16Kb of storage. Now admittedly Franck was probably reminiscing the good-old days so this is probably quite out of context, but it did get me thinking, and thus this post.

NB: Here's a video showing this in action ! - See video .

A "regular" empty table in Production

Here's a regular small table that could be found in Production. It has a Primary Key, a text column and a JSONB column. Let's check the table size using the pg_total_relation_size() postgres function (you can read more about that function here).

db1=# create table t(id bigint primary key, b text, c jsonb);

CREATE TABLE

db1=# select pg_total_relation_size('t');

pg_total_relation_size

------------------------

16384

(1 row)

Hmmm, so that tweet did have a point. Given how low-cost memory has become over the decades, it is easy to understand why databases today chose to optimize speed over memory efficiency (more on this later) and so even an empty table in Postgres, does consume 16 kb.

But "where" is the 16kb being used?

db1=# select pg_relation_size('t');

pg_relation_size

------------------

(1 row)

The relation itself isn't consuming any space! That is good (again more on this later) but then where is the space being used then?

db1=# \d t

Table "public.t"

Column | Type | Collation | Nullable | Default

--------+--------+-----------+----------+---------

id | bigint | | not null |

b | text | | |

c | jsonb | | |

Indexes:

"t_pkey" PRIMARY KEY, btree (id)

We see that the table has a Primary Key - and thus an index - t_pkey.

Does the index is consuming 16kb?

db1=# select pg_relation_size('t_pkey');

pg_relation_size

------------------

8192

(1 row)

So the index is using some of it - 8kb - so that is progress - but who's using the other 8kb?

Let's start cutting the table down

Let's start cutting down the columns and see if the disk-usage goes down.

db1=# DROP TABLE t; CREATE TABLE t (id BIGINT PRIMARY KEY, b JSONB); select pg_total_relation_size('t');

DROP TABLE

CREATE TABLE

pg_total_relation_size

------------------------

16384

(1 row)

db1=# DROP TABLE t; CREATE TABLE t (id BIGINT PRIMARY KEY, b TEXT); select pg_total_relation_size('t');

DROP TABLE

CREATE TABLE

pg_total_relation_size

------------------------

16384

(1 row)

Hmmm, that didn't help at all. Dropping either of the TEXT or JSONB column didn't help. Let's look at the expanded version of this table to see if there's any similarity in the two columns. (I've clipped the output to make it easier to read)

db1=# \d+ t

Table "public.t"

--------+--------+-----------+----------+---------+----------+-...

b | text | | | | extended | ...

c | jsonb | | | | extended | ...

Indexes:

"t_pkey" PRIMARY KEY, btree (id)

Access method: heap

Clearly, the two column's are "extended" (you can read more about it here), but basically what happened here is that an extended column type resulted in the creation of a TOAST table (you can read more about TOAST here). Let's find out "how" to find the TOAST table for the table "t", and check if that could be consuming the remaining 8kb?

db1=# select oid, relname, reltoastrelid, reltoastrelid::regclass from pg_class where relname = 't';

oid | relname | reltoastrelid | reltoastrelid

-------+---------+---------------+-------------------------

18300 | t | 18303 | pg_toast.pg_toast_18300

(1 row)

db1=# select pg_relation_size('pg_toast.pg_toast_18300');

pg_relation_size

------------------

(1 row)

Hmmm, it's not the TOAST table. But just like the main table "t", could it be that the TOAST table has supporting relations that are to blame?

db1=# select pg_total_relation_size('pg_toast.pg_toast_18300');

pg_total_relation_size

------------------------

8192

(1 row)

db1=# \d pg_toast.pg_toast_18300

TOAST table "pg_toast.pg_toast_18300"

Column | Type

------------+---------

chunk_id | oid

chunk_seq | integer

chunk_data | bytea

Owning table: "public.t"

Indexes:

"pg_toast_18300_index" PRIMARY KEY, btree (chunk_id, chunk_seq)

db1=# select pg_relation_size('pg_toast.pg_toast_18300_index');

pg_relation_size

------------------

8192

(1 row)

Yes! So we see above, that the TOAST table implicitly has a primary key (of it's own) that uses an index - which has 1 page (8kb) assigned to it.

In a nutshell, the empty table 't' above, consumes 16kb and the storage allocation takes this shape:

Main relation - 0 bytes - 't'
Main relation Index - 8kb - 't_pkey'

Toast relation - 0 bytes - 'pg_toast_18300'
Toast relation Index - 8kb - 'pg_toast_18300_index'

Cut Cut Cut

Okay, lets see if we can reduce the table size further, by dropping both the Extended columns.

db1=# DROP TABLE t; CREATE TABLE t (id BIGINT PRIMARY KEY); select pg_total_relation_size('t');

DROP TABLE

CREATE TABLE

pg_total_relation_size

------------------------

8192

(1 row)

Okay, that makes sense. Now the main table (heap) is still not consuming anything, but the index still does.

Let's reduce further.

db1=# DROP TABLE t; CREATE TABLE t (id BIGINT); select pg_total_relation_size('t');

DROP TABLE

CREATE TABLE

pg_total_relation_size

------------------------

(1 row)

0 bytes!!

Nice! But seriously - 0 bytes?

It kind of makes sense, that since Postgres doesn't yet have anything to store - besides the metadata of the table (and since the metadata is stored in the system catalogs - for e.g. pg_catalog schema) - there isn't anything to store in the main relation (heap) as yet.

Disk Usage - Check filesystem

Hmmm - Nah, what if I don't trust Postgres?

Let's skip Postgres functions and ask the filesystem directly - and see if the table is actually 0 bytes. Here we first find the file path of the table in question using the postgres function pg_relation_filepath() and then ask the file-system for the file-size.

db1=# select pg_relation_filepath('t');

pg_relation_filepath

----------------------

base/17727/18356

(1 row)

db1=# \! ls -la /home/robins/proj/localpg/data/base/17727/18356

-rw------- 1 robins robins 0 Sep 23 10:19 /home/robins/proj/localpg/data/base/17727/18356

So the file corresponding to the table, actually is using 0 bytes. Nice!

Now, when a table is created, some entries are added to the system catalog. Let see if the database size is signficantly more than a blank database?

postgres=# create database db1;

CREATE DATABASE

postgres=# \c db1

You are now connected to database "db1" as user "robins".

db1=# CREATE TABLE t (id BIGINT);

CREATE TABLE

db1=# select pg_database_size('db1');

pg_database_size

------------------

7482515

(1 row)

db1=# create database db2;

CREATE DATABASE

db1=# \c db2

You are now connected to database "db2" as user "robins".

db2=# select pg_database_size('db2');

pg_database_size

------------------

7482515

(1 row)

Good. So this somewhat confirms that a blank database and a database with an "empty" table use the same disk-space.

Postgres is hiding something

Technically though, I'm lying. Well actually Postgres is ~~lying~~ hiding something from the file-system (i.e. every new table does make the database grow logically - a tad little - just that most of the times, the filesystem doesn't get the memo).

Under the cover, the way postgres stores data in a table (catalog table, or any user table), although it consumes a page of disk-space (often 8Kb), logically it may be consuming only a small part of that page. This is very helpful when more rows need to be stored in the table. When more rows come in, Postgres is able to reuse the same (first) page to now logically store more data - although for the filesystem - no extra pages were requested. This is what I was hinting at earlier that today's database use disk space (and thus memory cache) with larger (8kb) chunks and continue to keep using that page (until a new page is needed). Further below, I show a brief example of how all of this works.

But to summarize, it is then unfair to say that the system catalog did not grow at all (when a new table was added) - since some catalogs are guaranteed to have grown (for e.g. metadata of the new table is stored as an extra row in pg_class etc.) within the disk blocks already allocated as in-use for that catalog.

Let's squeeze a little more?

db1=# create table a();

CREATE TABLE

Right off the bat, that might seem completely wrong. Does Postgres allow a table with no Columns?

Yes !! 😎

All databases allow creation of an empty table (obviously), but Postgres allows a new table even if there are no columns! Let's verify this from the Postgres Documentation. Although suttle, we can see that in the syntax section of the CREATE TABLE page, the column_name data_type is enclosed with a square braces [] - which implies that columns are in fact, optional. What's more, this "feature" is a part of Postgres at least for the past 20 years!

Now the utility of this table is arguable (we'll explore that below), but it is now clear that this syntax is legal and works just fine.

Let's see how such a table looks like with psql \d

db1=# \d a

Table "public.a"

Column | Type | Collation | Nullable | Default

--------+------+-----------+----------+---------

That's it! That's the complete output - Since the table has no columns, the output above (rightly) doesn't show anything.

Let's go a little deeper

The obvious next question is - What on earth could a table like this be used for? That is a perfectly good question, and the answer is probably not much. However if you instead ask whether "Squeeze a little more" mean that a no column table takes less storage space? The answer (depends on input data but) is most probably yes. Let's see how does it help Postgres if it knows that you don't want to store any column in the table.

db1=# create table a();

CREATE TABLE

db1=# \dt+ a

List of relations

--------+------+-------+--------+-------------+---------------+---------+-------------

(1 row)

db1=# insert into a select;

INSERT 0 1

db1=# \dt+ a

List of relations

--------+------+-------+--------+-------------+---------------+------------+-------------

(1 row)

Nothing new here. We see that although an empty (0 column) table consumes 0 byte for storage. And like a regular table, as soon as the first row is inserted, the table uses 1 page - which in my test database (and probably 99.99% of postgres databases world-wide) consumes 8192 bytes. This is expected, but do note that the storage of logical rows in a postgres page, is a little oddly done (and for good reason). There is lllllooooottttt of detail here - but I wouldn't blame you if you'd want to keep that aside for a cold winter morning - when armed with a cup of hot coffee.

For now, we see below that each row that is inserted into the table, consumes 24 bytes - in that 8kb page.

db1=# create extension pageinspect ;

CREATE EXTENSION

db1=# truncate table a;

TRUNCATE TABLE

db1=# insert into a select FROM generate_series(1,3);

INSERT 0 3

db1=# SELECT * FROM heap_page_items(get_raw_page('a', 0));

| t_oid | t_data

----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------

+-------+--------

1 | 8168 | 1 | 24 | 85105 | 0 | 0 | (0,1) | 0 | 2048 | 24 |

| | \x

2 | 8144 | 1 | 24 | 85105 | 0 | 0 | (0,2) | 0 | 2048 | 24 |

| | \x

3 | 8120 | 1 | 24 | 85105 | 0 | 0 | (0,3) | 0 | 2048 | 24 |

| | \x

(3 rows)

Still going Deeper - but on a Tangent

So does the above imply that adding more columns to a table would mean Postgres consumes more bytres-per-row? Let's verify:

db1=# drop table t;

DROP TABLE

db1=# create table t(id bigint);

CREATE TABLE

db1=# truncate table t; insert into t select generate_series(1,3); vacuum full t; \dt+ t

TRUNCATE TABLE

INSERT 0 3

VACUUM

List of relations

--------+------+-------+--------+-------------+---------------+------------+-------------

(1 row)

db1=# SELECT * FROM heap_page_items(get_raw_page('t', 0));

| t_oid | t_data

----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------

+-------+--------------------

1 | 8160 | 1 | 32 | 85100 | 0 | 0 | (0,1) | 1 | 2816 | 24 |

| | \x0100000000000000

2 | 8128 | 1 | 32 | 85100 | 0 | 0 | (0,2) | 1 | 2816 | 24 |

| | \x0200000000000000

3 | 8096 | 1 | 32 | 85100 | 0 | 0 | (0,3) | 1 | 2816 | 24 |

| | \x0300000000000000

(3 rows)

Here we see that each row is now consuming 32 bytes - which is an extra 8 bytes from earlier. Good chances the only column we've added is the reason for the extra 8 bytes. Let's verify that using the pg_column_size() function (you can read more about it here):

db1=# select pg_column_size(1::bigint);

pg_column_size

----------------

(1 row)

But wait, there's one more twist here:

db1=# INSERT INTO t SELECT;

INSERT 0 1

db1=# SELECT * FROM heap_page_items(get_raw_page('t', 0));

| t_oid | t_data

----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------

--+-------+--------------------

1 | 8160 | 1 | 32 | 85111 | 0 | 0 | (0,1) | 1 | 2816 | 24 |

| | \x0100000000000000

2 | 8128 | 1 | 32 | 85111 | 0 | 0 | (0,2) | 1 | 2816 | 24 |

| | \x0200000000000000

3 | 8096 | 1 | 32 | 85111 | 0 | 0 | (0,3) | 1 | 2816 | 24 |

| | \x0300000000000000

4 | 8072 | 1 | 24 | 85113 | 0 | 0 | (0,4) | 1 | 2049 | 24 | 0000000

0 | | \x

(4 rows)

So wait, see row 4. Although the table has a column, just because the column didn't have a value, the row actually consumed only 24 bytes (the minimum)?

Is this scalable? I mean can I have a 5 column table and still Postgres stores a row, but only consume the bare minimum 24 bytes? Let's see:

db1=# drop table h;

DROP TABLE

db1=# create table h(c1 bigint, c2 bigint, c3 bigint, c4 bigint, c5 bigint);

CREATE TABLE

db1=# insert into h select;

INSERT 0 1

db1=# SELECT * FROM heap_page_items(get_raw_page('h', 0));

| t_oid | t_data

----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------

--+-------+--------

1 | 8168 | 1 | 24 | 86262 | 0 | 0 | (0,1) | 5 | 2049 | 24 | 0000000

0 | | \x

(1 row)

So yes, that does work and it does scale for "many" columns - but with a minor variation. There's more detail in code, but basically for regular columns the header contains 1 bit per column which expands in 8 byte chunks - and so say for 100 column table (with no data) - Postgres consumes ~40 bytes per row.

db1=# drop table h; create table h(); select 'alter table h add column c' || n || ' bigint;' from generate_series(1,100) e(n); \gexec

ALTER TABLE

db1=# select count(*) from pg_attribute where attrelid = 'h'::regclass;

count

-------

106

(1 row)

db1=# insert into h select; vacuum full h; SELECT * FROM heap_page_items(get_raw_page('h', 0));

INSERT 0 1

VACUUM

t_bits | t_oid | t_data

----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------

--------------------------------------------------------------------------------------------------+-------+--------

1 | 8152 | 1 | 40 | 88376 | 0 | 0 | (0,1) | 100 | 2817 | 40 | 0000000

0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 | | \x

(1 row)

Given the task at hand, I'd say that's still decently crisp.

So does a zero column table, squeeze max rows per page?

Yes, and No. So going back to a 0 column table - let's try to fill the whole page with rows and see how many can be stuffed on a single page.

If you're running low on coffee - the page-header is 24 bytes and so back-of-the-envelope math suggests that number of rows possible to squeeze into a page should be - ( 8192 bytes in a page - some bytes for page header & footer ) / 24 bytes per row = ~340 rows.

db1=# truncate table a; insert into a select FROM generate_series(1,340); vacuum full a; \dt+ a

TRUNCATE TABLE

INSERT 0 340

VACUUM

List of relations

--------+------+-------+--------+-------------+---------------+-------+-------------

(1 row)

Something's not right! That size should have stayed 8Kb. Why did Postgres use 16Kb (two pages - instead of one)?

That's because of this tiny bit of trivia that the maximum number of tuples that Postgres can squeeze onto a page, is hard-coded to 291 (for an 8kb page) - which was interesting to know - but to clarify, Heap-Only-Tuples (HOT feature) can effectively force a few more rows on a running database, but we'll go deeper into that some other day.

So let's go back and confirm if that understanding is correct - that Postgres can in fact squeeze at max only 291 rows onto the same page.

db1=# truncate table a; insert into a select FROM generate_series(1,291); vacuum full a; \dt+ a

TRUNCATE TABLE

INSERT 0 291

VACUUM

List of relations

--------+------+-------+--------+-------------+---------------+------------+-------------

(1 row)

db1=# truncate table a; insert into a select FROM generate_series(1,292); vacuum full a; \dt+ a

TRUNCATE TABLE

INSERT 0 292

VACUUM

List of relations

--------+------+-------+--------+-------------+---------------+-------+-------------

(1 row)

Here we see that:

When 291 rows are inserted (blue), the table stays at 1 page (8kb)
Whereas when 292 (291+1) rows are inserted (green), the table expands to 2 pages (16 kb)

Utility

So, that's all fine, but what's this table good for?

Well beyond understanding Postgres :) not much. This table is unhelpful for most database tasks. It can't store values - it wouldn't allow columns / indexes / selective deletes (let's avoid ctid hacks for now) etc.

But if I was forced to conjure an idea, a high-contention ticker app (that only needs to store +1s) via postgres functions - may be (and that's a BIG may be) this table could be used to store +1s - with a back-off algorithm on the application side. Good chances if there's a good DBA - this is done much better in many other ways (simplest of which is at the application end, or as a value in a column etc.) but it'd be better than nothing.

There are few other possible use-cases discussed here - for e.g. if (for some reason) you'd want to add columns to a table in a programmatic fashion after a base table is created - like we did above in the 100 column table test, OR, if you want to reserve a table name (in a multi-user setup) months / years in advance etc.

Finally

Much Ado About Nothing... This was a good exercise where we learnt something new about how Postgres stores table data - when ironically - there's nothing to store :)

Hope you had fun! Comments more than welcome.

29 Dec 2020

Which SQL causes a Table Rewrite in Postgres?

EDIT: Updated to v17 (devel) - (Jan 2024).

While developing SQL based applications, it is commonplace to stumble on these 2 questions:

What DDLs would block concurrent workload?
Whether a DDL is going to rewrite the table (and in some cases may need ~ 2x disk space)?

Although completely answering Question 1 is beyond the scope of this post, one of the important pieces that helps answering both of these questions is whether a DDL is going to cause a relfilenode change..

For a brief background, each regular table in Postgres stores data in one or more files, each of which is referenced in the postgres catalog with a relfilenode. A simple way to check whether the current implementation is going to create / refer to another copy (file) is whether the relfilenode changes. (TRUNCATE is a standout here, which by design is going to purge the table data, so although the relfilenode would change here, in total it obviously wouldn't consume anywhere close to 2x disk-space)

The table below shows which DDLs would cause a table rewrite. As has been discussed here, we need some more info to completely answer Question 1, however meanwhile this table helps in making some concurrency / disk-usage related decisions for all Postgres versions supported today.

11 Jun 2015

Using Pi as an home-based Media Server

This is among a series of articles on my experience with the Pi.

This article is about using a Pi as a primitive low-end File-Server for your home network:

Expectations:

Torrent-Server

Download Torrents
Store on a File-Share

Windows Share

Serve the File-Share as a Windows Share Drive
Allow Read / Write to this Windows Share Drive

Use File-Share as Media-Server

Using any Smartphone / Laptop

Play Movies using VLC
View all Photos / Home-Videos

Effectively:

Always on

Always accepting new Torrent requests
Instantly start downloading

In Real-time

Allow user's to view torrent download status

Use any UPnP Phone App to play Video content over WiFi on a SmartPhone
Use VLC (Network Streaming) to play any Video over WiFi on Laptop / Desktop
Use Windows Share Drive and view all Photos / Home-Videos as needed

How-To:

On Server

Install Torrent-Daemon
Configure Torrent-Daemon to listen on RPC requests

Here's the howto for that

Configure Samba

Set the Download folder to be shared

On Windows

Install Transmission-GUI-Remote

Desktop / Smartphone

Configure GUI to use the RPC based Torrent-Daemon server
Make this application the default for .torrent & magnet files

On Linux

Install Transmission-Remote

sudo apt-get install transmission-remote

Configure that to use Daemon (instead of downloading directly with transmission-cli)

Pros/Cons:

Pros

Once torrent download has begun, client can disconnect / shutdown client computer
Server continues to get torrents, after a restart
miniDLNA serving speed pretty decent

Watching a movie (over WiFi) on VLC

CPU ratio barely 0.01 which is pretty decent

Should be able to easily serve a small army :)

If no other IO is happening
If WiFi isn't the bottleneck

Cons

Storage on Pi needs to be managed from time-to-time

Currently is highly adhoc based

Truly taking RAID concept to heart!!

Have 6 Pen Drives (ranging from 4GB to 16GB)

8GBs are INR ~170 ( $3 )
All connected via 3 USB Hubs (both USB 2.0 and USB 3.0)

One Powered, two non-powered hubs

Most on Btrfs

Pretty stable, surprising that still get 1+ MBps with Pi's CPU

Some VFAT fs since have need of moving stuff off of Pi to a Windows Laptop
All mounted under /disk

Very temporary arrangement

Ideally am looking for a LVM solution (which allows me to remove / add pen-drives on the fly and thus can be called as one) that actually is suited for this purpose.

But miniDLNA picks up Photo / Audio / Video pretty well from all different mounted folders

Download speed limitations

Internet Speed = 2 MBps
Daemon Download speed capping = 1Mbps

But PI never reaches it :D !

Just imagine the poor configuration

Pi + Btrfs + USB 2.0 + 3 USB Chain + Unpowered Hub

Still consistent at 450kbps! Impressive!

Lack of Transmission-Daemon configuration tool means all configuration has to be done via black-screen configuration files

Careful configuration

Give all access to 'all' users only if you're sure that all users are going to be careful

21 Dec 2014

Why's my database suddenly so big?

In a Product based company, at times DB Developers don't get direct access to production boxes. Instead, an IT resource ends up managing a large swathe of Web / DB boxes. Since such a resource generally has a large field of operation, at times they need quick steps to identify the sudden high-disk usage.

In such a scenario (where Production is silo-ed out of DB Developers), correct triaging of a Database disk-usage spike is especially helpful, because a good bug-report is at times the only help that'd ensure a 1-iteration resolution.

Recently, one of our Production Database boxes hit a Nagios disk-alert and an IT personnel wanted to identify who (and specifically what) was causing this sudden spike.

From the looks of it, it clearly was a time-consuming + space-consuming Job running on the box and not much that could have been done (sans terminating it), but the following steps could have helped to prepare a bug-report of-sorts for the DB developer to identify / correct it before this happens again:

Isolate whether the disk increase is because of postgres. i.e. Compare the following two disk-usage outputs:

df -h
du -h --max-depth=1 /var/lib/postgresql/

On the psql prompt

SELECT
datname AS db_name,
pg_size_pretty(pg_database_size(oid)) AS db_size
FROM pg_database
ORDER BY pg_database_size(oid) DESC
LIMIT 10;
Connect to the Database. (Assuming that there is one large database on the system, and its name is X. In the above output, it'd be the first row, under db_name):
\c X
From Wiki: SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 10; SELECT NOW();

A minor addition, but sometimes, a simple thing such as appending the current Server-Timestamp is 'very' beneficial for accurate triaging. For e.g. If a given table is known to grow very big, N minutes into processing, (assuming we have at least basic logging in place) a Developer can easily identify which method to focus on.
In most cases, the query above (2.4) would give the list of tables that one is looking for:

In most cases, it turns out that the Job is expected to consume processing disk-space, and if so, the DB developer probably needs to request IT for mandatory empty-disk space for each night's periodic processing. That's a simple solution and we're done.
Alternatively, its possible that the Job is (unexpectedly) creating some very large tables (and probably removing it post processing). Such cases, may need optimization, which probably got aggravated purely out of DB growth. Possible, but again, we're clear about how to resolve it.

However, at times the tables returned doesn't cover 'all' that could consume disk-space:

The list of tables generated in Step 2.4 above covers:

(Regular) Tables
Temporary Tables
Unlogged Tables

However, it does not cover tables created within another session's transaction. More detailing on that given below:

Since Temporary Tables (within a Transaction) are NOT visible from another session, Query 2.4 when run by an administrator in another psql session, would not be able to show the temporary table.

To find disk-consuming tables, an alternate approach may yield better results:

$ cd /var/lib/postgres/9.2/data/base/
$ ls -lR | awk '$5 > 1000000' | awk '{print "File: " $9 ", Size: " $5}' | sed 's/\./ part-/g'
File: t22_3586845404, Size: 819200000

File: 3586846552, Size: 630161408
File: t24_3586846545, Size: 1073741824
File: t24_3586846545 part-1, Size: 1073741824
File: t24_3586846545 part-2, Size: 1073741824
File: t24_3586846545 part-3, Size: 559702016

Lets analyse this Shell command:

When run from the data/base folder of PostgreSQL it shows which file uses most disk-space across *all* databases. If you already know which database to focus on, you may want to go inside data/base/xxxx folder and run this Shell command there instead.
In the output we see three things:

File 3586846552 is a large file pointing to a (non-temporary) table
File t22_3586845404 is a large file, points to a *temporary* table but is less than 1Gb size
File t24_3586846545 is a large file, also points to a *temporary* table and is between 3Gb and 4Gb in size, (basically because each file part is a 1Gb volume) and therefore is a good contender to be researched further.

So lets investigate file t24_3586846545 further.

From a psql prompt:
postgres=# \x
Expanded display is on.
postgres=#
WITH x AS (
SELECT trim('t24_3586846545')::TEXT AS folder_name
),
y AS (
SELECT
    CASE
      WHEN position('_' in folder_name) = 0
        THEN folder_name::BIGINT
      ELSE substring(folder_name
        FROM (position('_' in folder_name) + 1))::BIGINT
    END AS oid
FROM x
),
z AS (
SELECT
    row_to_json(psa.*)::TEXT AS pg_stat_activity_Dump,
    query AS latest_successful_query,
    array_agg(mode) AS modes,
    psa.pid
FROM y, pg_locks join pg_stat_activity psa
    USING (pid)
WHERE relation = oid
    AND granted
GROUP BY psa.pid,row_to_json(psa.*)::TEXT, query
)
SELECT *
FROM z

UNION ALL

SELECT
    'Doesnt look like this folder (' ||
    (SELECT folder_name FROM x) ||
    ') stores data for another session''s transaction'::TEXT,
    NULL, NULL, NULL
FROM z
HAVING COUNT(*) = 0;
-[ RECORD 1 ]-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pg_stat_activity_dump   | {"datid":"2946058308","datname":"rt_ra","pid":16828,"usesysid":"16384","usename":"rms","application_name":"psql","client_addr":null,"client_hostname":null,"client_port":-1,"backend_start":"2014-12-20 18:52:47.702365+05:30","xact_start":"2014-12-20 20:01:12.630708+05:30","query_start":"2014-12-20 20:53:41.733738+05:30","state_change":"2014-12-20 20:53:41.734325+05:30","waiting":false,"state":"idle in transaction","query":"select n.nspname from pg_class c join pg_namespace n on n.oid=c.relnamespace\nwhere c.relname ='a' and n.nspname like 'pg_temp%';"}
latest_successful_query | select n.nspname from pg_class c join pg_namespace n on n.oid=c.relnamespace
                        | where c.relname ='a' and n.nspname like 'pg_temp%';
modes                   | {AccessExclusiveLock,RowExclusiveLock}
pid                     | 16828

postgres=#

The output of this SQL is probably going to be at least of some help, if the rogue table, is a table within another Transaction. I'll try to disect what this query is doing:

The first column is basically the entire pg_stat_activity row mangled into a JSON object.
The pid is the PID of the Postgres process that is serving the Connection that currently has a Lock on the table that identifies t24_3586846545. This way, we can know more about the connection that created this (large) table. (Please be very clear, that unless you know what you're doing, you shouldn't *kill* any postgres process from bash).
The columns 'last_successful_query' and 'modes', are probably uninteresting to the Admin, but may be of big help to the Developer.

(Obviously, a hands-on IT admin needs to replace the string (t24_3586846545) in the above SQL, with the file-name that (s)he gets the most number of times, when executing the previous shell-command).

Happy triaging :) !

15 Dec 2006

Software RAID and other storage options

When it comes to disks, I have used a Samsung, Seagate as well as Maxtor and (without any bias ...) nowadays, almost all of them conk out pretty soon. Sometimes I wonder if they all make the hard-disks in Taiwan… Just like they say in Armageddon …

‘American components, Russian components … they are all made in Taiwan’.

I still have those 4 Gb disks that were a prized possession a long time back (and that still work flawlessly !!). And now when all the 160Gbs and the 320Gbs are falling like dead rats, I am seriously contemplating installing my Windows on that old 4 Gb partition … !!

Well … I almost was going to… when I saw a software RAID (on a windows partition) work pretty well! Slurp... :)

So, my next experiment was with two 20 Gb Hard disks paired as a (software) RAID disk, with Microsoft Windows installed on them. What I also contemplated was to make use of the motherboard’s hardware RAID feature and get two large hard disks to pair up and provide (relatively reliable and) large storage. This had the added advantage of more reliable RAID support, albeit at the cost of the fact that those extra RAID drivers ‘may’ be flaky …

The tests worked out barely fine, but the basic idea of data-safety still lurked around. Seeing one of the 'proprietary' cheap motherboard RAID cards drop dead suddenly in office, the mere thought about losing sleep over an untested system seemed unreasonable. A similar idea about using Software RAID on Ubuntu setup was tried, but its performance was not up to the mark. Further, it still had some administrative work to be done in order to get down to 'business', and its utility then proved questionable.

What I require is a data-storage solution, which lets me be, and gets me to do 'my job' rather than worry about data-safety and costs associated to it.

Update: (2012/01/29) Over the years, at least now I know what solution would do the job. Its detailed a little more in another post on this blog.