20 Nov 2015

Reverse Port Forwarding and how !

Recently had to work on a PoC that required spinning off Docker instances with custom configurations. The problem wasn't as much with the Docker aspect, which is probably up for another story sometime, as much as the fact that as luck had it, we couldn't commission a separate machine for this PoC, and I eventually had to do the entire development inside a 2G VirtualBox VM on my 8G Laptop.

The tricky part was that we had to submit the URL to the PoC Landing Page even before it could be ported to a full-fledged VM/server.

So effectively, we:
  • Submitted a URL (on ServerA) running Apache, but no space to host a 40G VM.
  • Had a Laptop where the VM was developed, but can't stay online since its not always connected
  • Had another server (ServerB) where the VM needs to be ported, to be always up for PoC evaluation
  • ServerA was in the US, whereas, ServerB (& Laptop) are half-way across the world in India.
Separately, the VM was an Ubuntu installation that had the following:
  • A running NodeJS (Bootstrap / Express combo) serving the PoC GUI
  • Separately it hosted the docker runtime that spun off Docker instances that had their own propietary application + Tomcat running on port 8080, all of which eventually supposed to be visible company-wide
  • Thirdly, it had a PostgreSQL 9.3 database that served as a backend to all the running docker-based app instances

To connect ServerA with the PoC hosted on ServerB we tried the following:
  • Creating VirtualHost entries on the ServerA that pointed to corresponding ports on the VM
    • This failed because the VM was running inside a NATted configuration and thus was invisible to ServerA
  • Then we tried an alternate solution, wherein Apache->VirtualHost entries on ServerA were pointed to port 80 on ServerB. This required port-forwarding port 80 on ServerB to port 3000 (NodeJS) on ServerB. Although for this we used NetSH, this failed for various reasons, some of which we could identify and some we couldn't. For e.g. we installed the ipv4 module for NetSH, also tried installing ipv6, as well as, enabling Windows Firewall for NetSH to work (contrary to popular belief that Windows Firewall might be the reason for things getting blocked)
  • We finally settled down to running persistent SSH connections directly from the VM to ServerA (thereby by-passing all Windows Firewall issues) and setting up Remote Port-Forward such that remote ports could connect to Node Server inside the VM
    • This too initially didn't work as expected. To identify the cause we:
      • isabled SELinux on ServerB, then ServerA (although that didn't make sense) as well as on the VM to no avail
      • Then we disabled IPTables on all the relevant machines and that still didn't help
      • Finally we realised that the Remote Port-Forwarding though working well, was getting attached to the localhost interface on ServerA. Which means that although 127.0.0.1:80 on ServerA was working as expected, 192.168.1.1:80 on the same server immediately gave a "connection failed". It took a while to realise that though I was logging in trans-atlantic, the immediate 'Connection Failed' was an issue with ServerA and not with this side of the line.
      • What complicated things post that was that the regular -g option was insufficient. It required setting the GatewayPorts on the SSHD (with a sshd restart) to work as expected.

Eventually, after a few hours of transatlantic jugglery, finally got a working URL that looked something like this... What a day !



12 Nov 2015

Pigz - Parallel GZip

For a while, I've been frustrated with the fact that GZip was unable to consume idle CPUs on my laptop. Recently read about PIGZ (a drop-in GPL replacement) that implicitly consumes all CPUs.

Pronounced Pig-Zee, PIGZ is a drop-in replacement for GZIP and automatically consumes all CPUs of the host machine in order to get the compression completed much faster.

OLD: tar -cf - sourcefolder | gzip > sourcefolder.tar.gz

NEW: tar -cf - sourcefolder | pigz > sourcefolder.tar.gz

Enjoy!

11 Jun 2015

Using Pi as an home-based Media Server

This is among a series of articles on my experience with the Pi.

This article is about using a Pi as a primitive low-end File-Server for your home network:

Expectations:

  • Torrent-Server
    • Download Torrents
    • Store on a File-Share
  • Windows Share
    • Serve the File-Share as a Windows Share Drive
    • Allow Read / Write to this Windows Share Drive
  • Use File-Share as Media-Server
    • Using any Smartphone / Laptop
      • Play Movies using VLC
      • View all Photos / Home-Videos


Effectively:
  1. Always on
    1. Always accepting new Torrent requests
    2. Instantly start downloading 
  2. In Real-time
    1. Allow user's to view torrent download status
  3. Use any UPnP Phone App to play Video content over WiFi on a SmartPhone
  4. Use VLC (Network Streaming) to play any Video over WiFi on Laptop / Desktop
  5. Use Windows Share Drive and view all Photos / Home-Videos as needed

How-To:
  • On Server
    • Install Torrent-Daemon
    • Configure Torrent-Daemon to listen on RPC requests
      • Here's the howto for that
    • Configure Samba
      • Set the Download folder to be shared
  • On Windows
    • Install Transmission-GUI-Remote
    • Configure GUI to use the RPC based Torrent-Daemon server
    • Make this application the default for .torrent & magnet files
  • On Linux
    • Install Transmission-Remote
      • sudo apt-get install transmission-remote
    • Configure that to use Daemon (instead of downloading directly with transmission-cli)

Pros/Cons:
  • Pros
    • Once torrent download has begun, client can disconnect / shutdown client computer
    • Server continues to get torrents, after a restart
    • miniDLNA serving speed pretty decent
      • Watching a movie (over WiFi) on VLC
        • CPU ratio barely 0.01 which is pretty decent
          • Should be able to easily serve a small army :)
            • If no other IO is happening
            • If WiFi isn't the bottleneck 
  • Cons
    • Storage on Pi needs to be managed from time-to-time
      • Currently is highly adhoc based
        • Truly taking RAID concept to heart!!
          • Have 6 Pen Drives (ranging from 4GB to 16GB)
            • 8GBs are INR ~170 ( $3 )
            • All connected via 3 USB Hubs (both USB 2.0 and USB 3.0)
              • One Powered, two non-powered hubs
            • Most on Btrfs
              • Pretty stable, surprising that still get 1+ MBps with Pi's CPU
            • Some VFAT fs since have need of moving stuff off of Pi to a Windows Laptop
            • All mounted under /disk
              • Very temporary arrangement
                • Ideally am looking for a LVM solution (which allows me to remove / add pen-drives on the fly and thus can be called as one) that actually is suited for this purpose.
            • But miniDLNA picks up Photo / Audio / Video pretty well from all different mounted folders
    • Download speed limitations
      • Internet Speed = 2 MBps
      • Daemon Download speed capping = 1Mbps
        • But PI never reaches it :D !
          • Just imagine the poor configuration
            • Pi + Btrfs + USB 2.0 + 3 USB Chain + Unpowered Hub
              • Still consistent at 450kbps! Impressive!
    • Lack of Transmission-Daemon configuration tool means all configuration has to be done via black-screen configuration files
  • Careful configuration
    • Give all access to 'all' users only if you're sure that all users are going to be careful

25 May 2015

SAVE Tax with new NPS provisions!!!

This article is about NPS (National Pension Scheme) or New Pensions System as some may know, which is a loose-ended replacement of the erstwhile Pension System prevelant in India. Although not a 100% replacement, this is the next best thing, and thus understanding it better is a good idea.

For all those trying hard to save Taxes while working their way around the Indian Income Tax system, the following may help in clarifying doubts / providing more avenues.

As per Income Tax provisions effective in 2015-16, investments in NPS can be done in three ways:
  • Employee contribution, under 80C
  • Direct investment in NPS (outside of your employer), under 80CCD(1B)
  • Employer contribution, under 80CCD(2)

 Let's look at these in more detail:

Employee contribution, under 80C

Any Indian (that has a PRAN) can invest in NPS by contributing to their NPS account by investing a minimum of Rs. 500/- per month or a minimum of Rs. 6000/- annually. This investment into NPS can be deducted from the person's taxable income (subject to a maximum capping of Rs. 1,50,000/- capping under 80C).

Although a good idea, this however, is generally futile since most people (who have been working for a while), have already reached their Rs. 1,50,000/- 80C limit via other means (for e.g. Life Insurance / ULIP / PF etc.). Then investing in NPS is although good in the long-term, however, it does not contribute to saving tax for the current financial year.


Direct investment in NPS (outside of your employer), under 80CCD(1B)

Any Indian (that has a PRAN) can invest directly into NPS, without the support of his / her employer. This facility has been available for a while, and is the oldest form of investing in NPS. Till recently, the caveat to this form of investment was that this did not have any Tax Exemption.

Although modified earlier, as of 2015-16, the Income Tax Provisions are such that investments in NPS (made directly) up to Rs. 50,000/-  can be deducted from the person's Taxable Income. To clarify, this doesn't mean that one can't invest more than Rs. 50k, but that only the first Rs. 50k of that amount can be deducted from his / her taxable income.

For e.g. Lets assume that Ms. Lata's has consumed Rs. 1,50,000/- 80C investment options (via Life Insurance / PF investments) her net taxable income is Rs. 3,75,000/-. Now lets assume that she invested Rs. 1,50,000/- directly to NPS (outside of her employer's assistance), then the next taxable income for her would become Rs. 3,25,000/- (i.e. 3,75,000 - 50,000). So although the entire sum of Rs. 1,50,000 was invested into NPS, only the first Rs. 50,000 was deducted from taxable income.


Employer contribution, under 80CCD(2)

The third and the most unclear & interesting section is the 80CCD(2) that allows an employee to save much more tax than was possible earlier.

Under this section, (apart from the above two clauses), an employee can request his / her employer to deduct a given sum from the monthly salary, and invest in NPS. This contribution (upto a maximum of 10% of Basic Pay) can be additionally deducted from the employee's taxable income, which in some cases can be a big boon to the net tax outflow in the financial year.

Example for all above sections

Lets take an example that elaborates all the sections given above:

Lets assume that Ms. Lata's Basic pay is Rs. 11,00,000 (11 lakh) and she has invested Rs. 1,00,000 in Life Insurance and Rs. 40,000 in ELSS Funds, as well as Rs. 10,000 in NPS (under section 80C). Further, she directly invested (outside of her employer's assistance) invested Rs. 50,000 in her NPS account (under section 80CCD(1B) ). Lastly, she requested her employer to invest Rs. 10,000/- per month in her NPS account under Section 80CCD(2).

Then her net taxable income would be as follows:

Taxable income = 11,00,000 
                   - Rs. 1,50,000 under 80C      - max (1.5 lakh)
                   - Rs.   50,000 under 80CCD(1B)- max (50k)
                   - Rs. 1,10,000 under 80CCD(2) - max (10% of Basic)
               = 11 lakh - 1.5 lakh - 0.5 lakh - 1.1 lakh
               = 7.9 lakh

This should clarify all doubts pertaining to investment in NPS for the financial year 2015-16.

17 May 2015

Basic OLAP Support in PostgreSQL

While reviewing an existing application, I thought it'd be worthwhile to review how good / bad PostgreSQL is in terms of OLAP. This (growing) post is going to be my (un)learning of how ready is PostgreSQL.

  1. Row Numbering
    1. Support: Yes. 
    2. Use: Row_Number() function numbers rows generated in a result-set.
    3. Example:

      SELECT
        row_number() OVER (ORDER BY marks DESC) AS rn,
        name
      FROM x;
    4. Review: Some databases have different variants that accomplish this (for e.g. Oracle has a pseudo column called ROWNUM), but PostgreSQL fully supports the SQL Compliant syntax.
  2. Rank()
    1. Support: Yes. 
    2. Use: Rank() and Dense_Rank() functions number the rank of the compared item. 
    3. Example:

      SELECT 
        rank() OVER (ORDER BY marks DESC) AS rn,
        dense_
      rank() OVER (ORDER BY marks DESC) AS drn,
        
      name

      FROM x;
       
    4. Review: Its useful and fully supported.
  3. Window Clause
    1. Support 
      1. OVER (PARTITION BY): Yes
      2. OVER (ORDER BY): Yes
      3. OVER (RANGE): Yes
    2. Use:  Read more here.
    3. Example:  
    4. Review: These are extremely helpful for people serious about data-extraction / reporting and fully supported.
  4. NTile
    1. Support: . Yes
    2. UseNtile().
    3. Example:

      SELECT 
        ntile(4) OVER (ORDER BY marks DESC) AS quartile,

        ntile(10) OVER (ORDER BY marks DESC) AS decile,

        ntile(100) OVER (ORDER BY marks DESC) AS percentile,

        
      name

      FROM x;
       
    4. Review: Versatile and fully supported.
  5. Nested OLAP Aggregations
    1. Support: No
      1. But doable with alternative SQL? : Yes
        1. Is that as Performant? : Mostly No
    2. Description: Allow something like

      SELECT
        subject,
        AVG(SUM(marks) GROUP BY class)
      FROM marks
      GROUP BY subject;
    3. Alternative:  This could be done with Sub-Selects like this:

      SELECT
        subject,
        AVG(sum_marks) AS avg
      FROM (
        SELECT
         subject
         class,
         SUM(marks) AS sum_marks   
        FROM marks   
        GROUP BY subject, class
        ) mrk
      GROUP BY subject;
    4. Review: In the two examples we are trying to calculate the Per-Subject-Average of (Total marks obtained in different classes). Although PostgreSQL doesn't support this form of nested-aggregates, it clearly is a neat form of doing things. The alternative, acceptably looks like a kludge, and it would be a nice to have feature. 
  6. GROUPING SETS
    1. SupportYes (in 9.5)
    2. Alternative:  This could be alternatively done with UNION ALL like this:
      SELECT
       SubjectID,
       NULL AS StudentID,
       AVG(marks)
      FROM marksGROUP BY SubjectID
      UNION ALL

      SELECT
       NULL AS SubjectID,
       StudentID,
       AVG(marks)
      FROM marksGROUP BY StudentID;
    3. Review: Popular databases (Oracle / MSSQL) support this well. PostgreSQL has had this has on the ToDo list from at least a decade ! Looking at the alternative, one can see that this is not just lengthy (and repetitive .. thus error-prone), but also non-performant (simply because it requires multiple-runs of the same data-set).
    4. History:
      1. Already in PostgreSQL TODO list
      2. Discussions started (at least) way back in 2003.
      3. Patch:
        1. 2008 patch that didn't make it.
        2. 2014 patch was heavily in discussion since and finally just got through to PostgreSQL 9.5.
  7. ROLLUP
    1. Description: An obvious extension to GROUPING BY (explained above), ROLLUP could be explained with a simple example:

       GROUP BY ROLLUP (Year, SubjectID, StudentID)

      is equivalent to

       GROUP BY GROUPING SETS
       (Year, SubjectID, StudentID)
       (Year, SubjectID)(Year)
       ();
    2. Support: Yes (in 9.5)
    3. Alternative:  This could be alternatively done with CTEs.

      WITH x AS (
        SELECT Year, SubjectID, StudentID, marks
        FROM marks
        WHERE passed
          AND NOT inactive
      )
      SELECT *
        FROM x

      UNION ALL

      SELECT
          Year, SubjectID, StudentID, AVG(marks)
        FROM x
        GROUP BY Year, SubjectID, StudentID

      UNION ALL

      SELECT
          Year, SubjectID, NULL AS StudentID, AVG(marks)
        FROM x
        GROUP BY Year, SubjectID

      UNION ALL

      SELECT
          Year, NULL AS SubjectID, NULL AS StudentID,
      AVG(marks)
        FROM marks
        GROUP BY Year;
    4. Review: ROLLUPs are ideal to generate things like Sub-totals, which at times form key performance factors when generating large Reports. The alternative essentially uses a CTE, which is subsequently used to calculate subtotals and totals. For multiple-reasons, this is sub-optimal and can be sped up, if only for in-built support. Besides, the alternative is lengthy & repetitive (thus error-prone).
    5. History:
      1. Discussions started (at least) way back in 2003.
      2. Patches submitted
        1. The 2010 patch seemingly didn't make it.
        2. The 2014 attempt finally got through.
  8. CUBE
    1. SupportYes (in 9.5)
    2. Description: Just like ROLLUP (was an extension of GROUPING SETS), CUBEs are an extension of ROLLUP (and thereby GROUPING SETS) and could be explained with the following example:

       GROUP BY CUBE (Year, SubjectID, StudentID)
      is equivalent to

       GROUP BY GROUPING SETS
        (Year, SubjectID, StudentID)
        (Year, SubjectID)
        (Year, StudentID)
        (Year)
        (SubjectID, StudentID)
        (SubjectID)
        (StudentID)
        ();
    3. Review: The alternative (not provided for obvious reasons) is not just lengthy & repetitive (thus error-prone) but primarily not as performant as is otherwise possible.
  9. MERGE INTO / UPSERT
    1. SupportYes (in 9.5)
    2. Doable with alternative SQL? : Yes (for 9.4 and below)
      1. Is the alternative as Performant?
        1. No: This is because the alternative (URL given below) is a BEGIN/EXCEPTION based solution which is (time-wise) costly and an in-built support would certainly be faster.
    3. Description: For those new to the complexity of MERGE (or UPSERT) please read this first.

      TLDR: In the face of
      Concurrent Use, MERGE is difficult when a trying to balance Performance vs Integrity.

      Unlike some other Database engines (that are sometimes okay with trading-off Integrity when it conflicts with Performance), PostgreSQL consistently prioritizes Data Integrity. The 'best' solution seems to have taken longer than expected, but considering that when a complicated open-source development model needs to coherently agree upon core feature additions, it really takes a few falling stars to get this piece-of-code in, with most people in support of it.
    4. Example (SQL to create scenario + below SQL taken from here)
      MERGE INTO bonuses B
      USING (
       SELECT employee_id, salary
       FROM employee
       WHERE dept_no =20) E
      ON (B.employee_id = E.employee_id)
      WHEN MATCHED THEN
       UPDATE SET B.bonus = E.salary * 0.1
      WHEN NOT MATCHED THEN
       INSERT (B.employee_id, B.bonus)
       VALUES (E.employee_id, E.salary * 0.05);
    5. Alternative: The PostgreSQL documentation mentions one recommended way of doing UPSERT / MERGE here. But again, this is non-performant and 9.5 based support for INSERT .. ON CONFLICT (a.k.a. UPSERT).
    6. History
      1. MySQL / Oracle / MSSQL support this very well.
      2. Long-pending requirement as per Wiki and now finally has made through!

16 May 2015

Postgres finally has CUBE / ROLLUP / GROUPING SETS !

Finally !

A *much* awaited feature, this attempt at adding the GROUPING SETS / ROLLUP / CUBE feature to PostgreSQL has been in the works for about a year (besides the so many in the past decade and a half that didn't get through), and thankfully this has finally got the approval of the powers that be, so the upcoming Postgres 9.5 would finally have this long pending SQL feature.

MSSQL and Oracle have had this for a while and then its time that PostgreSQL sport this as well. A big boon for Report generating SQLs this feature basically makes (what was earlier possible with lots of unmanageable hack of SQL), now possible with much cleaner code, and with much better (at times single pass) performance.

Read here to know more about OLAP support in PostgreSQL.

Thanks a ton Andrew Gierth and Atri Sharma and so many others who directly or indirectly assisted in getting this patch out of the door!

Andrew / Atri... take that long pending break... one look at that mail thread and it seems you deserve it :D !

21 Dec 2014

Why's my database suddenly so big?


In a Product based company, at times DB Developers don't get direct access to production boxes. Instead, an IT resource ends up managing a large swathe of Web / DB boxes. Since such a resource generally has a large field of operation, at times they need quick steps to identify the sudden high-disk usage.

In such a scenario (where Production is silo-ed out of DB Developers), correct triaging of a Database disk-usage spike is especially helpful, because a good bug-report is at times the only help that'd ensure a 1-iteration resolution.

Recently, one of our Production Database boxes hit a Nagios disk-alert and an IT personnel wanted to identify who (and specifically what) was causing this sudden spike.

From the looks of it, it clearly was a time-consuming + space-consuming Job running on the box and not much that could have been done (sans terminating it), but the following steps could have helped to prepare a bug-report of-sorts for the DB developer to identify / correct it before this happens again:
  1. Isolate whether the disk increase is because of postgres. i.e. Compare the following two disk-usage outputs:
    1. df -h
    2. du -h --max-depth=1 /var/lib/postgresql/
  2. On the psql prompt
    1. SELECT
        datname AS db_name,
        pg_size_pretty(pg_database_size(oid)) AS db_size
      FROM pg_database
      ORDER BY pg_database_size(oid) DESC
      LIMIT 10; 
    2. Connect to the Database. (Assuming that there is one large database on the system, and its name is X. In the above output, it'd be the first row, under db_name):
    3.  \c  X
    4. From Wiki: SELECT nspname || '.' || relname AS "relation",
          pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
        FROM pg_class C
        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
        WHERE nspname NOT IN ('pg_catalog', 'information_schema')
          AND C.relkind <> 'i'
          AND nspname !~ '^pg_toast'
        ORDER BY pg_total_relation_size(C.oid) DESC
        LIMIT 10; SELECT NOW();
  3. A minor addition, but sometimes, a simple thing such as appending the current Server-Timestamp is 'very' beneficial for accurate triaging. For e.g. If a given table is known to grow very big, N minutes into processing, (assuming we have at least basic logging in place) a Developer can easily identify which method to focus on.
  4. In most cases, the query above (2.4) would give the list of tables that one is looking for:
    1. In most cases, it turns out that the Job is expected to consume processing disk-space, and if so, the DB developer probably needs to request IT for mandatory empty-disk space for each night's periodic processing. That's a simple solution and we're done.
    2. Alternatively, its possible that the Job is (unexpectedly) creating some very large tables (and probably removing it post processing). Such cases, may need optimization, which probably got aggravated purely out of DB growth. Possible, but again, we're clear about how to resolve it.
  5. However, at times the tables returned doesn't cover 'all' that could consume disk-space:
    1. The list of tables generated in Step 2.4 above covers:
      1. (Regular) Tables
      2. Temporary Tables
      3. Unlogged Tables
    2. However, it does not cover tables created within another session's transaction. More detailing on that given below:

Since Temporary Tables (within a Transaction) are NOT visible from another session, Query 2.4 when run by an administrator in another psql session, would not be able to show the temporary table. 

To find disk-consuming tables, an alternate approach may yield better results:

$ cd /var/lib/postgres/9.2/data/base/
$ ls -lR | awk '$5 > 1000000' | awk '{print "File: " $9 ",  Size: " $5}' | sed 's/\./ part-/g'

File: t22_3586845404,  Size: 819200000
File: 3586846552,  Size: 630161408
File: t24_3586846545,  Size: 1073741824
File: t24_3586846545 part-1,  Size: 1073741824
File: t24_3586846545 part-2,  Size: 1073741824

File: t24_3586846545 part-3,  Size: 559702016

Lets analyse this Shell command:
  • When run from the data/base folder of PostgreSQL it shows which file uses most disk-space across *all* databases. If you already know which database to focus on, you may want to go inside data/base/xxxx folder and run this Shell command there instead.
  • In the output we see three things:
    • File 3586846552 is a large file pointing to a (non-temporary) table
    • File t22_3586845404 is a large file, points to a *temporary* table but is less than 1Gb size
    • File t24_3586846545 is a large file, also points to a *temporary* table and is between 3Gb and 4Gb in size, (basically because each file part is a 1Gb volume) and therefore is a good contender to be researched further.

So lets investigate file t24_3586846545 further.

From a psql prompt:
postgres=# \x
Expanded display is on.
postgres=#

WITH x AS (
  SELECT trim('t24_3586846545')::TEXT AS folder_name
),
y AS (
  SELECT
    CASE
      WHEN position('_' in folder_name) = 0
        THEN folder_name::BIGINT
      ELSE substring(folder_name
        FROM (position('_' in folder_name) + 1))::BIGINT
    END AS oid
  FROM x
),
z AS (
  SELECT
    row_to_json(psa.*)::TEXT AS pg_stat_activity_Dump,
    query AS latest_successful_query,
    array_agg(mode) AS modes,
    psa.pid
  FROM y, pg_locks join pg_stat_activity psa
    USING (pid)
  WHERE relation = oid
    AND granted
  GROUP BY psa.pid,row_to_json(psa.*)::TEXT, query
)
  SELECT *
  FROM z

UNION ALL

  SELECT
    'Doesnt look like this folder (' ||
    (SELECT folder_name FROM x) ||
    ') stores data for another session''s transaction'::TEXT,
    NULL, NULL, NULL
  FROM z
  HAVING COUNT(*) = 0;

-[ RECORD 1 ]-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pg_stat_activity_dump   | {"datid":"2946058308","datname":"rt_ra","pid":16828,"usesysid":"16384","usename":"rms","application_name":"psql","client_addr":null,"client_hostname":null,"client_port":-1,"backend_start":"2014-12-20 18:52:47.702365+05:30","xact_start":"2014-12-20 20:01:12.630708+05:30","query_start":"2014-12-20 20:53:41.733738+05:30","state_change":"2014-12-20 20:53:41.734325+05:30","waiting":false,"state":"idle in transaction","query":"select n.nspname from pg_class c join pg_namespace n on n.oid=c.relnamespace\nwhere c.relname ='a' and n.nspname like 'pg_temp%';"}
latest_successful_query | select n.nspname from pg_class c join pg_namespace n on n.oid=c.relnamespace
                        | where c.relname ='a' and n.nspname like 'pg_temp%';
modes                   | {AccessExclusiveLock,RowExclusiveLock}
pid                     | 16828

postgres=#


The output of this SQL is probably going to be at least of some help, if the rogue table, is a table within another Transaction. I'll try to disect what this query is doing:
  • The first column is basically the entire pg_stat_activity row mangled into a JSON object.
  • The pid is the PID of the Postgres process that is serving the Connection that currently has a Lock on the table that identifies t24_3586846545. This way, we can know more about the connection that created this (large) table. (Please be very clear, that unless you know what you're doing, you shouldn't *kill* any postgres process from bash).
  • The columns 'last_successful_query' and 'modes', are probably uninteresting to the Admin, but may be of big help to the Developer.

(Obviously, a hands-on IT admin needs to replace the string (t24_3586846545) in the above SQL, with the file-name that (s)he gets the most number of times, when executing the previous shell-command).

Happy triaging :) !

What's in an empty table?

How much storage does an empty table in Postgres take? This is a post about Postgres tables that store ... well basically ...  Nothing . The...