pg_dump and pigz, easy rsyncable backups with PostgreSQL

Some time ago, I created an ad-hoc offsite backup solution for a MySQL database after I recovered it. This happened after a client contacted me when one of their legacy databases blew up. The recovery process was quite painful because the backups that they had were corrupted and incomplete (a monthly cronjob). I ended up with a simple setup that used mysqldump, gzip and rsnapshot to great effect. This article talks about effectively using a similar backup method with PostgreSQL.

The solution in MySQL-space

When preparing to deploy a solution I had made into production, I knew I needed backups. Its main data store was a PostgreSQL 9.3 database, so I remembered my MySQL adventure and went looking for similar tools to the ones I used before. The core of the earlier MySQL solution looked like this:

# omitting all the rotation logic
mysqldump $OPTIONS --user=$USER --password=$PWD $DB --routines --no-data --add-drop-database --database $DB | gzip --rsyncable > "$DIR/schema.sql.gz"

TABLES=$(mysql --user=$USER --password=$PWD -Bse 'show tables' $DB)
for TABLE in $TABLES
do
    BACKUP_FILE="$DIR/${TABLE}.sql.gz"
    echo "dumping $TABLE into $BACKUP_FILE"
    mysqldump $OPTIONS --user=$USER --password=$PWD $DB $TABLE | gzip --rsyncable > "$BACKUP_FILE"
done

This dumps the database as separate files: schema.sql.gz to setup the schema and $TABLE.sql.gz for the row data of each table. This allows restoring of partial sets of data easily.

mysqldump outputs SQL on stdout by default, which makes it easy to pipe to gzip to create compressed archives. Data stored in databases is usually quite compressible, so piping to gzip saves a lot of space.

There’s a catch though. Regular gzip with no flags has a serious disadvantage for offsite backup: a small change in the raw data provokes a large change in the compressed data. This means that every time we rsync the latest backup over, it will transfer the entire lot. Said in another way, the speedup factor reported with the --stats flag is more or less 1.00. Classic doubleplusungood.

That’s why we supply the --rsyncable flag when compressing the data with gzip. This resets the compression dictionary from time to time such that the blocks are compressed independently. A small change in the source will thus not change the entire compressed archive and the rsync delta encoding algorithm can work its magic even on gzipped files!

Sadly, the --rsyncable flag is not mainline, it’s a custom patch carried by debian (and ubuntu). To top it off, it seems that the patch was misapplied for Debian Wheezy. It doesn’t even error out when using that flag for some reason, so I hadn’t noticed until a few days ago. This effectively brought my rsync speedup factor to 1.00, poor backup server.

I went looking for alternatives. One could go for backports, testing repositories or even custom packages but I’m always apprehensive of such things in production servers. I looked at lz4 (not available in Debian Wheezy), bzip2 (slow, large blocks), xz/lzma until finding what appeared to be the solution: pigz. It’s a parallel implementation of gzip. It has mainlined support of --rsyncable and is in the Wheezy repository.

Backing up with Postgres

mysqldump’s equivalent in the Postgres world is pg_dump. I like pg_dump better because it allows dumping in several formats with accompanying up- and downsides. It also has a sister command pg_restore allowing much more flexibility when restoring a backup. The 4 output formats available at the moment of writing are (if you already know them you can skip the list, it’s mostly from the Postgres docs):

  • plain: like mysqldump, mostly SQL queries but with faster data loading (no INSERT statements but loading from heredoc-like buffers). This is the default. The big advantage of this format is that it is human-readable.
  • custom: Output a custom-format archive suitable for input into pg_restore. Together with the directory output format, this is the most flexible output format in that it allows manual selection and reordering of archived items during restore. This format is also compressed by default. It’s not human-readable though, unless you turn off compression with -Z0, in which case opening it in a text editor will allow one to make sense of a lot of things.
  • directory: Output a directory-format archive suitable for input into pg_restore. This will create a directory with one file for each table and blob being dumped, plus a so-called Table of Contents file describing the dumped objects in a machine-readable format that pg_restore can read. A directory format archive can be manipulated with standard Unix tools; for example, files in an uncompressed archive can be compressed with the gzip tool. This format is compressed by default and also supports parallel dumps. This is most similar to what I had been forcing mysqldump to do, except that with pg_dump you only need one command.
  • tar: Output a tar-format archive suitable for input into pg_restore. The tar-format is compatible with the directory-format; extracting a tar-format archive produces a valid directory-format archive. However, the tar-format does not support compression and has a limit of 8 GB on the size of individual tables. Also, the relative order of table data items cannot be changed during restore. try not to use this as can use a lot of temporary space on your harddrive. This might not only wear the underlying HD out and consume needless space but also thrash the disk page cache.

NOTE: If you’re using the directory output format the nifty flag --jobs becomes available, allowing parallel dumping.

It seems that we can get the flexibility of partially restoring a database without requesting the tables first by choosing the custom or directory methods.

The directory method creates a folder with some files in it: 2298.dat.gz, 2300.dat.gz, … and finally toc.dat. They are compressed by default, but it’s possible to turn this off by passing the -Z0 flag to pg_dump. This allows compressing those files with an rsync-friendly method afterwards. The downside is that all those uncompressed bytes will get written to disk (and into the page cache). The larger your DB, the worse this will become for your system.

For this reason, the custom format seems like the best of both worlds. The output can be piped straight into a compress filter. Curious as to the performance and rsyncability of a few variations on this theme, I started benchmarking. The most up-to-date scripts I used are available in a gist, possibly outdated versions are shown below:

The first script generates a dump of the database in several formats (my database is called m2d, adjust accordingly):

# gen.sh
DIR="$1"

[ -d "$DIR" ] || mkdir -p "$DIR"

# -Z0 is to force no compression, we supply this flag when we pipe to
# our own compressor
pg_dump -Fc m2d > "$DIR/m2d.compr.dump"
pg_dump -Z0 -Fc m2d > "$DIR/m2d.raw.dump"
pg_dump -Z0 -Fc m2d | pigz > "$DIR/m2d.pigz.dump.gz"
pg_dump -Z0 -Fc m2d | gzip > "$DIR/m2d.gzip.dump.gz"
pg_dump -Z0 -Fc m2d | pigz --rsyncable > "$DIR/m2d.arsync.dump.gz"

The script will generate a few files corresponding to some different ways of compressing (or not) the output. Run it once to supply a baseline, make some edits to the database and export again:

$ ./gen.sh orig
$ psql m2d
# make some changes to the database, try to make a small change in the largest tables
$ ./gen.sh changed

Time to test the rsyncability. I switched to another host and used a script to rsync every file separately so that I could clearly see the speedup factor.

# look out for the speedup factors reported by rsync
#
# NOTE
#  you'll have to change the HOST variable below to
#  point to the host + folder to fetch the files from

HOST="vagrant:/home/vagrant/pgdumptests"

ORIG="$1"
NEW="$2"

rsync -avh --stats --progress $HOST/$ORIG/ data/
rsync -avh --stats --progress $HOST/$NEW/m2d.arsync.dump.gz data/
rsync -avh --stats --progress $HOST/$NEW/m2d.compr.dump data/
rsync -avh --stats --progress $HOST/$NEW/m2d.gzip.dump.gz data/
rsync -avh --stats --progress $HOST/$NEW/m2d.pigz.dump.gz data/
rsync -avh --stats --progress $HOST/$NEW/m2d.raw.dump data/

I ran it on the other host like this:

$ bench.sh orig changed

The result:

Method File size Speedup Bytes sent
uncompressed 6.40MB 132.77 33.01KB
pigz –rsyncable 723.35KB 15.22 42.37KB
pg_dump compress 716.49KB 1.17 608.14KB
pigz 686.26KB 1.05 646.27KB
gzip 689.76KB 0.99 688.32KB

Conclusion

It seems that if you’re only concerned about bandwidth, the uncompressed variant is actually best in combination with rsync). Yet the file size of my tiny database in uncompressed is 9x the size of the compressed variant. Since I also like to keep backups on the same server as the DB, that becomes a bit hard to stomach for the relatively small servers I’m working with.

The sweet spot between simplicity, low bandwidth off-site backup and low disk space usage seems to fall straight into the camp of piping the output of pg_dump into pigz --rsyncable. Very respectable delta encoding speedups (15x) and file size (9x smaller) are within reach. Take these numbers with a grain of salt because it obviously depends on what’s in the database. If you’re just storing large binary blobs in the database, this obviously won’t work nearly as well.

Since this article is already getting a bit long in the tooth, the implementation of the off-site backup server with rsnapshot is left up to the reader.

Further benefits & ideas

On the side of the backup server, it becomes advantageous to use something like rdiff-backup to be able to keep an incredible amount of backups using minimal amounts of space using the same delta encoding as rsync.

The astute reader will notice that it might be a good idea to decompress the backups before handing them over to rdiff-backup. From the table above we can see that the uncompressed format has the highest delta encoding efficiency. This added efficiency might cause an rdiff-backup based solution to use even less space than the compressed variant, because the delta’s could be smaller (look at the bytes sent column). It depends on the churn rate and the type of data in the database though.

Tags: , , ,