File Storage

Introduction

None of the machines are currently backed up automatically. Therefore, it is up to each user to ensure the security of their files.

As of August 2015, we have a dedicated file-storage server, parkingspace. If it is not in your .ssh/config table, add it via the IP address 192.168.0.102.

Backups

There is a sample backup script in the software repository, under utils/backup_sample.sh. Make sure you have rsync installed. It is based on the one here.

Some notes on the script:

  • The remote backup server (e.g. timessquare), destination (here the directory ‘backups’) and source directory (here my entire home directory) are hardcoded. The date is calculated when you run the script.

  • rsync -azP says “backup my data, send it compressed, and if you get interrupted, make sure you can start in the middle.”

  • delete and delete-excluded says “delete any files that I don’t want you to backup.” We’ll see what those are in the next line.

  • The files are listed as --exclude=file. You can replace all of these with a file, by adding the line --exclude-from=$HOME/.rsync/exclude and putting all the patterns to exclude in ~/.rsync/exclude. Let’s look at this example:

    $ cat ~/.rsync/exclude
    tmp_runs/
    WAVECAR
    
  • link-dest is cool: if any file was backed up in the last backup, then instead of sending it over the network again (which can be pretty slow), it just creates a “hard link” to that same file. One of the nice effects is that if the same file is in 100 backups, it doesn’t take 100x disk space. On the other hand, I don’t retrieve the disk space until I delete it from all 100 backups. See the Internet for details.

  • max-size skips anything more than 50MB. You can leave it out if you want, but it’s nice to ensure it doesn’t get too big.

  • the next line gives the source and destination

  • Then when the backup is done, the script moves it from incomplete_back-date to back-date, and updates the directory backups/current/ to the latest backup

Instructions:

  1. Make sure you change the $HOME to your home directory.

  2. ssh parkingspace "mkdir backups"

  3. Customize your excludes and max-size in the script

  4. Setup a crontab to back it up. Type:

    crontab -e
    

You may be prompted to choose an editor. Then add a new line:

#backup every Thursday at 4:23 pm
23 16 * * 5 ~/bin/backup.sh 2>> $HOME/crontab.log

Checkups

A RAID is useful only if we replace degraded hard drives before enough degrade to lose data. Therefore, there is a script parkingspace_checkup.sh available in the software repository (under utilities) that will email you, through your cunix logon. If you set it up via a crontab, you will get emails every (e.g. month) informing you of the RAID health. Make sure to change the $mailsend and $mailto values to your own account, and ensure that you have publickey logon to the cunix account.

As of now, Mordechai has it set up to email him.

Technical information

parkingspace has six 4-TB drives in a software RAID-6 configuration (so if two disks fail, no data is lost). This gives us approximately 15 TB of usable space (subtracting out swap and boot).

There are actually two RAIDs, one for the swap ( /dev/md1, comprised of /dev/sd?3 ) and one for data ( /dev/md0, mounted on /, comprised of dev/sd?2, in an ext4 filesystem). Each disk also has a grub section to make it bootable.

The machine is running Ubuntu 14.04, and uses mdadm to manage the RAID.