Slow rsnapshot/rdiff-backup/rbackup/rsync backups?

Our server started to crawl under a very normal workload.  Grinding away with loads as high as 40 (yes, forty).  I finally ejected and restarted daemons, killed processes, and even rebooted the server.  I’d imagine you’ve been in situations like this before!  Maybe you’re in one like it right now.

In situations like this, there are two main components to resolving the issue: 1. Determining what the problem is, and 2. Doing something about it.

The Problem

For me, the culprit this time was the rsnapshot backup tool.  It’s essentially a script that works with cron and rsync/cp/rm to create a rotating set of backups using hard links to minimize resource consumption.  Unfortunately, as it turns out, rsnapshot was completely hammering our server, causing the system to go spiraling into a cyclical high-load-of-doom state that it never really recovered from.  Even the 60 minute window between hourly backups wasn’t enough time for the system to recover.

Specifically, the problem with rsnapshot isn’t that it hogs the CPU; rather, it’s disk-bound.  There’s so much IO taking place that everything screeches to a halt because the system can’t even read from or write to the disk.

The Solution

You might try using nice on your cron job, only to find that it doesn’t help. After that, you might try updating your rsnapshot.conf to include nice before the rsync/rm/cp commands. You might even try hacking the perl source to rsnapshot to insert your nice directly.

Let me save you some effort. Using nice alone won’t help, because we’re talking about IO here, not processor cycles. Also, updating rsnapshot.conf won’t help, because rsnapshot won’t allow you to use nice -n19 /usr/bin/rsync as your cmd_rsync.

What you really need to do is to create intermediate scripts that enforce the nice (CPU) and ionice (IO) restrictions on the commands. You need three new files. Create them as root in the location specified (both optional) and set them as world-executable:

/usr/local/bin/rsync-nice

#!/bin/sh
/usr/bin/ionice -c3 /bin/nice -n19 /usr/bin/rsync $*

/usr/local/bin/rm-nice

#!/bin/sh
/usr/bin/ionice -c3 /bin/nice -n19 /bin/rm $*

/usr/local/bin/cp-nice

#!/bin/sh
/usr/bin/ionice -c3 /bin/nice -n19 /bin/cp $*

Then, simply update your /etc/rsnapshot.conf file to point to these commands instead of the standard ones:


#################################
# EXTERNAL PROGRAM DEPENDENCIES #
#################################

# LINUX USERS: Be sure to uncomment "cmd_cp". This gives you extra features.
# EVERYONE ELSE: Leave "cmd_cp" commented out for compatibility.
#
# See the README file or the man page for more details.
#
#cmd_cp /bin/cp
cmd_cp /usr/local/bin/cp-nice

# uncomment this to use the rm program instead of the built-in perl routine.
#
#cmd_rm /bin/rm
cmd_rm /usr/local/bin/rm-nice

# rsync must be enabled for anything to work. This is the only command that
# must be enabled.
#
#cmd_rsync /usr/bin/rsync
cmd_rsync /usr/local/bin/rsync-nice

If your experience is anything like mine (remember, we had loads spiking to 40), your backups will complete faster, your loads will remain low, and your system will be responsive again. Yay, ionice!

(Hat tip to David Cantrell)