The Christmas break was pretty good, with one exception. The hard drive on my laptop gave me some serious grief. Long story short, I wiped it, made new partitions, and reinstalled the OS. Now luckily, I was smart enough to keep partial daily backups on my server for just these occasions so nothing too important was lost (thesis). However, the keywords here are partial backups. This means that most of my settings and non-crucial data was not backed up, and that has turned out to be a bit of a drag.
Anyways, the hard drive appears to be functioning correctly now, I purchased an el cheapo (some of us never learn) external drive, and I'm looking to do full backups of my $HOME
directory from now on. A quick search took me to the concepts outlined here. This scheme is pretty clever, is suprisingly fast, and very efficient on drive space. The result is a simple Python script that seems to do the trick. Well, it's mostly Python with a bunch of Bash calls via os.system()
, but close enough.
Basically, it uses rsync to synchronize data over an ssh connection to my remote server. That remote machine has the external drive attached and mounted. Given a remote destination directory, it creates a series of backup directories based on the local machine's hostname. For example, my laptop's hostname is sigma and the script creates the directories sigma/sigma
, sigma/sigma.1
, sigma/sigma.2
etcetera all the way up to sigma/sigma.7
. These numbered directories contain snapshots of the data back seven runs. So if you run the script once per day, you have a snapshot for the past seven days. Instead of storing all of the data seven times however, it uses hard links to save space. I'll let you read up on this concept here if you are interested.
Several assumptions are made in this script. One is that you have passwordless ssh logins setup correctly. Also, the script requires a configuration file called ~/.backuprc
where several custom settings are stored and read. Here is an example of such a file.
EXCLUDES = *.iso,*.avi,*.mpg,*.mp3,*.ogg,*.wma,*.wmv,*.mov,*.LNK,*.LCK
SOURCES = ~/
DEST_HOST = arker.homelinux.org
LOG_DIR = ~/.backup_logs/
DEST_PATH = /media/backup_drive/home_backup/
The variables in this file break down as follows...
-
EXCLUDES: A comma separated list of patterns matching files that are not to be backed up.
-
SOURCES: A comma separated list of directories to be backed up. The example instructs the script to back up the user's entire home directory.
-
DEST_HOST: The host of the remote server where the backups will be sent.
-
LOG_DIR: The location where log files should be saved.
-
DEST_PATH: The destination on the remote machine where the backups will be stored.
I put an entry in my user's crontab to run the script at 12:05pm every day. The entry looks like this:
5 12 * * * /home/dcraven/bin/backup
The script is located in my ~/bin directory. If you need help with cron, have a look at this tutorial. Keep in mind that when run this way, cron will only execute the command if the machine is on when it is scheduled. It will not automatically run when the machine is started when a scheduled time is missed. For this behaviour, look into /etc/cron.daily
and friends.
At the moment, all of these settings are required in the file. At some point I might write something a little more robust and clean, but for now this works and I have other things that should be given a higher priority.