9 Mar 2016

Separate Index Creation from Data Population during pg_restore

Recently I had to restore a large DB to just extract data (I didn't care about Indexes etc) and I was short of Disk Space.

So for this corner-case, let's assume that you want to skip indexes when restoring a large databases, and coincidentally also have the following scenario to deal with:
  • pg_dump was taken with pg_dump -Fc db > db.pg_dump
    • Noticeably, you can't take another pg_dump without causing a lot of tickets / approvals
  • You need to skip indexes (and FKs / PKs etc.)
    • Basically just want the data restored as soon as possible
  • Additionally, optionally, you want to create indexes / FKs / PKs as a low-priority task, at a later stage (probably an hour later)

# Take pg_dump
pg_dump -Fc db > db.pg_dump

# Create DB Item list
pg_restore -l db.pg_dump > db.list

# Create DB Item List (without Index / Trigger / PKs / FKs etc.)
# Remove / add filters so that the skipped list is as per need
>db.noindex.list
grep -v "[0-9] CONSTRAINT " db.list >> db.noindex.list
grep -v " FK CONSTRAINT " db.list >> db.noindex.list
grep -v " TRIGGER " db.list >> db.noindex.list
grep -v " INDEX "  db.list >> db.noindex.list


# Restore Target DB with this filtered list of DB Items
pg_restore -L db.noindex.list db.pg_dump -d targetdb


# =====================

# Data RESTORE is now Done!
# =====================


# Find the list of constraints we skipped
comm -23 ysa.list.original db.noindex.list > db.onlyindex.list

# Restore that skipped list of constraints
pg_restore -L db.noindex.list db.pg_dump -d targetdb

This splitting of INDEX creation vs Data Population is ideal for scenarios, which need to go online for minor queries and can accommodate concurrent INDEX creation immediately after going online. It is also good for Non-Production process that take a cross-sectional dump of the large data-set for other purposes (which is what my use-case was).

3 comments:

Thomas REISS said...

pg_restore provides some useful options, like --section. Using lists can be handy some times, but --section will help you much more in your case.

pg_restore --section=pre-data ... restores the schemas first
pg_restore --section=data ... restores the data
pg_restore --section=post-data ... restores the constraints and indexes

Thomas REISS said...

The feature was introduced in PostgreSQL 9.2 ;)

Robins Tharakan said...

Thanks Thomas :) !!