That Guy From Delhi: experience

This is an old gotcha, that has been documented (although probably not much) and is something that we (in my company) almost got caught with recently.

This post, is about documenting the issue that looks like something that isn't going away anytime soon.

Consider this:

CREATE TABLE k (id INTEGER, b text);
CREATE TABLE l (id INTEGER);

CREATE VIEW vw_k AS SELECT k.b FROM k JOIN l USING (id);
TABLE vw_k;
ALTER TABLE l RENAME COLUMN id TO id2;
INSERT INTO k(id, b) VALUES (1, 'abc');
INSERT INTO l(id2) VALUES (1);
TABLE vw_k;
DROP VIEW vw_k;
CREATE VIEW vw_k AS SELECT k.b FROM k JOIN l USING (id);

This is the output:

rt_testing=# CREATE TABLE k (id INTEGER, b text);
CREATE TABLE
rt_testing=# CREATE TABLE l (id INTEGER);
CREATE TABLE
rt_testing=# CREATE VIEW vw_k AS SELECT k.b FROM k JOIN l USING (id);
CREATE VIEW
rt_testing=# TABLE vw_k;
b
---
(0 rows)

rt_testing=# ALTER TABLE l RENAME COLUMN id TO id2;
ALTER TABLE
rt_testing=# INSERT INTO k(id, b) VALUES (1, 'abc');
INSERT 0 1
rt_testing=# INSERT INTO l(id2) VALUES (1);
INSERT 0 1
rt_testing=# TABLE vw_k;
b
-----
abc
(1 row)

rt_testing=# DROP VIEW vw_k;
DROP VIEW
rt_testing=# CREATE VIEW vw_k AS SELECT k.b FROM k JOIN l USING (id);
ERROR: column "id" specified in USING clause does not exist in right table

If you noticed, despite changing the base table's relevant column (i.e. l.id => l.id2), the VIEW kept working without fail, until you actually try to DROP / CREATE the VIEW sometime down the line.

In fact, this gets difficult to trace because the VIEW continues to work normally even after the column name change. For e.g. New rows inserted to the updated table, continue to show up in the VIEW as if its definition was up to date!

To summarize, whenever changing a column name (used in a USING clause on a dependent VIEW), be careful that PostgreSQL may not necessarily complain about the dependency. You'd need to be careful to make that check yourself. Or else its possible that sometime down the line, when you're trying to restore a production backup, you'd start seeing errors related to changes you made a year back!

In a Product based company, at times DB Developers don't get direct access to production boxes. Instead, an IT resource ends up managing a large swathe of Web / DB boxes. Since such a resource generally has a large field of operation, at times they need quick steps to identify the sudden high-disk usage.

In such a scenario (where Production is silo-ed out of DB Developers), correct triaging of a Database disk-usage spike is especially helpful, because a good bug-report is at times the only help that'd ensure a 1-iteration resolution.

Recently, one of our Production Database boxes hit a Nagios disk-alert and an IT personnel wanted to identify who (and specifically what) was causing this sudden spike.

From the looks of it, it clearly was a time-consuming + space-consuming Job running on the box and not much that could have been done (sans terminating it), but the following steps could have helped to prepare a bug-report of-sorts for the DB developer to identify / correct it before this happens again:

Isolate whether the disk increase is because of postgres. i.e. Compare the following two disk-usage outputs:

df -h
du -h --max-depth=1 /var/lib/postgresql/

On the psql prompt

SELECT
datname AS db_name,
pg_size_pretty(pg_database_size(oid)) AS db_size
FROM pg_database
ORDER BY pg_database_size(oid) DESC
LIMIT 10;
Connect to the Database. (Assuming that there is one large database on the system, and its name is X. In the above output, it'd be the first row, under db_name):
\c X
From Wiki: SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 10; SELECT NOW();

A minor addition, but sometimes, a simple thing such as appending the current Server-Timestamp is 'very' beneficial for accurate triaging. For e.g. If a given table is known to grow very big, N minutes into processing, (assuming we have at least basic logging in place) a Developer can easily identify which method to focus on.
In most cases, the query above (2.4) would give the list of tables that one is looking for:

In most cases, it turns out that the Job is expected to consume processing disk-space, and if so, the DB developer probably needs to request IT for mandatory empty-disk space for each night's periodic processing. That's a simple solution and we're done.
Alternatively, its possible that the Job is (unexpectedly) creating some very large tables (and probably removing it post processing). Such cases, may need optimization, which probably got aggravated purely out of DB growth. Possible, but again, we're clear about how to resolve it.

However, at times the tables returned doesn't cover 'all' that could consume disk-space:

The list of tables generated in Step 2.4 above covers:

(Regular) Tables
Temporary Tables
Unlogged Tables

However, it does not cover tables created within another session's transaction. More detailing on that given below:

Since Temporary Tables (within a Transaction) are NOT visible from another session, Query 2.4 when run by an administrator in another psql session, would not be able to show the temporary table.

To find disk-consuming tables, an alternate approach may yield better results:

$ cd /var/lib/postgres/9.2/data/base/
$ ls -lR | awk '$5 > 1000000' | awk '{print "File: " $9 ", Size: " $5}' | sed 's/\./ part-/g'
File: t22_3586845404, Size: 819200000

File: 3586846552, Size: 630161408
File: t24_3586846545, Size: 1073741824
File: t24_3586846545 part-1, Size: 1073741824
File: t24_3586846545 part-2, Size: 1073741824
File: t24_3586846545 part-3, Size: 559702016

Lets analyse this Shell command:

When run from the data/base folder of PostgreSQL it shows which file uses most disk-space across *all* databases. If you already know which database to focus on, you may want to go inside data/base/xxxx folder and run this Shell command there instead.
In the output we see three things:

File 3586846552 is a large file pointing to a (non-temporary) table
File t22_3586845404 is a large file, points to a *temporary* table but is less than 1Gb size
File t24_3586846545 is a large file, also points to a *temporary* table and is between 3Gb and 4Gb in size, (basically because each file part is a 1Gb volume) and therefore is a good contender to be researched further.

So lets investigate file t24_3586846545 further.

From a psql prompt:
postgres=# \x
Expanded display is on.
postgres=#
WITH x AS (
SELECT trim('t24_3586846545')::TEXT AS folder_name
),
y AS (
SELECT
    CASE
      WHEN position('_' in folder_name) = 0
        THEN folder_name::BIGINT
      ELSE substring(folder_name
        FROM (position('_' in folder_name) + 1))::BIGINT
    END AS oid
FROM x
),
z AS (
SELECT
    row_to_json(psa.*)::TEXT AS pg_stat_activity_Dump,
    query AS latest_successful_query,
    array_agg(mode) AS modes,
    psa.pid
FROM y, pg_locks join pg_stat_activity psa
    USING (pid)
WHERE relation = oid
    AND granted
GROUP BY psa.pid,row_to_json(psa.*)::TEXT, query
)
SELECT *
FROM z

UNION ALL

SELECT
    'Doesnt look like this folder (' ||
    (SELECT folder_name FROM x) ||
    ') stores data for another session''s transaction'::TEXT,
    NULL, NULL, NULL
FROM z
HAVING COUNT(*) = 0;
-[ RECORD 1 ]-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pg_stat_activity_dump   | {"datid":"2946058308","datname":"rt_ra","pid":16828,"usesysid":"16384","usename":"rms","application_name":"psql","client_addr":null,"client_hostname":null,"client_port":-1,"backend_start":"2014-12-20 18:52:47.702365+05:30","xact_start":"2014-12-20 20:01:12.630708+05:30","query_start":"2014-12-20 20:53:41.733738+05:30","state_change":"2014-12-20 20:53:41.734325+05:30","waiting":false,"state":"idle in transaction","query":"select n.nspname from pg_class c join pg_namespace n on n.oid=c.relnamespace\nwhere c.relname ='a' and n.nspname like 'pg_temp%';"}
latest_successful_query | select n.nspname from pg_class c join pg_namespace n on n.oid=c.relnamespace
                        | where c.relname ='a' and n.nspname like 'pg_temp%';
modes                   | {AccessExclusiveLock,RowExclusiveLock}
pid                     | 16828

postgres=#

The output of this SQL is probably going to be at least of some help, if the rogue table, is a table within another Transaction. I'll try to disect what this query is doing:

The first column is basically the entire pg_stat_activity row mangled into a JSON object.
The pid is the PID of the Postgres process that is serving the Connection that currently has a Lock on the table that identifies t24_3586846545. This way, we can know more about the connection that created this (large) table. (Please be very clear, that unless you know what you're doing, you shouldn't *kill* any postgres process from bash).
The columns 'last_successful_query' and 'modes', are probably uninteresting to the Admin, but may be of big help to the Developer.

(Obviously, a hands-on IT admin needs to replace the string (t24_3586846545) in the above SQL, with the file-name that (s)he gets the most number of times, when executing the previous shell-command).

Happy triaging :) !

That Guy From Delhi

14 Apr 2016

ALTER TABLE Gotcha related to USING

21 Dec 2014

Why's my database suddenly so big?

Taming ReorderBufferWrite - Boost Logical Decoding in Postgres