Saturday, May 3, 2008

Methods for backing Up Your Data Proficiently

Summary

The amount of data that you personally generate in a workday determines how much time you can afford to lose
This article covers methodology for personal backups, in the workplace and home. How to accomplish this goal on a Windows XP environment is covered in more detail.

Goals for Backing Up Your Personal Work Data

The main goal of a backup is to ensure your threshold for data loss that can keep you productive in an emergency. The method you use for backup should not compromise security that you have in place for the live data. The selection of data should be updated relative to the importance, expandability, and freshness of the data being backed up.

Decide: Your Time of Loss Threshold

The amount of data that you personally generate in a workday determines how much time you can afford to lose if you lose your work. If you generate lots of data that cannot be duplicated in less than a day's time, you may need to backup more than once a day. If you do not create data, or your data is moved to an alternate location that is not your own, your threshold may be as large as a week or two.

Decide: Detecting the Data Loss

Haveing backup plan is only half of what is required. You must know what to do in case of a data loss, and if your backup will be reliable. The first step in planing the recovery is ensureing you detect the data loss in a reasonable amount of time. If all the data you are backing up will be used on a day to day basis, detection will be easier. If you do not touch some data for weeks or months, you may not detect it until it is too late and your backup system has overwritten the file you need.

Determine: Frequency That Your Data Changes

Just because you deal with your data everyday does not mean it changes, and vis versa. You need to identify the number of files, the size of those files, and how much space each will take up. Do you create new files each time you add new data? Do your existing files get modified/appended/overwritten when you add new data?

Determine: A Location For Your Data

The reliability of your backup source, and security need to be chosen carefully. Does the device itself have the same physical security of your computer? What about file permissions on network and local level? Is the data encrypted? You should take into account hard drive failure, so your backups must be on a separate machine. You also might want to consider if you require an off site location.

Encryption

Any sensitive data needs to be encrypted, and the encryption you chose determines how it gets backed up. Using filesystem encryption does not work well for physical backups, because it is encrypted/decrypted on read/write for Windows. On Linux, rsync can backup disks without decrypting them, as well as files. Encrypted disks cannot be inquired for what files have changed, and keep track of them. When copying a large modified encrypted file, only a versioning file system like ZFS can detect the changes in the encrypted file and keep past versions - only in a best case scenario. Windows has no way of doing anything like this on the file system or in a file container. The best way to start getting a handle on backing up encrypted files is to make sure you are only encrypting what you need encrypted, and that the quickly changing files are separated from the static ones - a passwords file verses a informational document.

I personally use TrueCrypt, with a large encrypted file as a volume on Windows. The changes to the file cannot be detected, and the entire file must be overwritten or duplicated when backed up. In Linux, rsync could potentially detect the changes in my large file, but I would still lose versions.

Windows XP: Backup Locked Files

On Windows, the NTFS file system locks files as they are opened and closed, so only Windows' proprietary Shadow Copy system can access locked files. The Backup utility on Windows XP can take advantage of the Shadow Copy, but you cannot use Shadow Copy via the command line. Other third party utilities can as well, but not always reliably and at a monetary cost usually.

Windows XP: Backup Non-locked Files

Files that are not locked can be copied normally, so quick retrieval is available. A script that copys files once or twice a day, that only copies changed or new files is good for hard disk backup. A good utility for this is Windows Resource Kit: Robocopy. An example command that does not copy video files and is by default incremental: "C:\Program Files\Windows Resource Kits\Tools\robocopy.exe" "C:\Documents and Settings\myprofile\My Documents" "\\MyBackupServer\Backup Share\My Documents" /e /XF *.mkv /XF *.avi /XF *.mpg /XF *.mpeg /XF *.mp4
When you run robocopy as a scheduled task, it leaves error codes as results. You can look them up in the readme or search here.

Windows XP: Quirks of NTBackup

Using NTBackup can be problematic. Here are some notes you need to know to correctly configure a backup.
  1. Always disable "Wizard Mode". You will need to uncheck this option on first run, and rerun it.
  2. On the "Backup" tab, your file selections are saved into a ".bks" file, with only lists the files to include. You can save as many selection files as you want, anywhere you want. These files do not contain the backup "Options"
  3. You can modify your ".bks" files without updating the task in Task Scheduler.
  4. On the "Backup" tab, your file selections are forced recursive scan on any folder that you check individual files in. If you check the file "C:\log.txt", all directories and files in "C:\" will be scaned, but not backed up. This takes too long. You may want to restructure your files so they have their own folders.
  5. The choices you make in "Tools...Options...Backup Type" only apply to the current running instance, and any scheduled task you make. You can see the results in the command line options for the task.
  6. The choices you make in "Tools...Options...Exclude File Types" are global, and are in effect anytime the program is run. You should choose them carefully, example: C:\*.mp3
  7. [Updated] When completing the job or creating the scheduled task, be sure to choose "Append this backup to media" for incremental and "Replace the data on the media with this backup" for normal mode. If you forget to check, the "/a" activates the append mode on the command listed in the scheduled task properties. Remove it to activate replace mode.

Windows XP: NTFS Archive Flag

Do you remember the "Archive" property, found on old FAT filesystem files? That property still exsistes in NTFS, it is just not easily changed manually. The first time you run a backup that has the "mark files as backed up" options, all the affected files will be marked.

Backup Types: Incremental vs Normal

The incremental backup type will only scan the file system for files that do not have the "Archive" or "backed up" property set. It will skip all files found with it set, regardless of you are creating a new backup project or test. If you are running the same backup again, the backup type incremental will always add the entire file to the backup archive if it has been modified since the previous one was added to the backup archive file. With incremental backups, modified files will result in multiple copies of files inside the archive file.

A normal backup completely ignores the "Archive" or "backed up" property when copying the file, and then sets the property once complete.

You should do a Incremental backup on directories that have files that do not change frequently, but you need versions in case you desire to restore a particular version of the file. Example: "My Documents"

You should do a Normal backup on directories that have files that change constantly. If you need versions, you should determine the versioning plan you need - which could be a new version each day. Ideally, you could delete old versions to save space. Example: your "Firefox Profile", backed up each day, after one week overwriting that same day of week. "MondayFirefoxProfile.bkf"

Windows XP: Using the Task Scheduler

DWhen you click "Start Backup" in NTBackup, you can add items to the Task Scheduler. Here are some notes you need to know to correctly configure a backup task.
  1. Windows treats each task listed like a file, you can copy outside the scheduler. You can only duplicate tasks by dragging them outside the Scheduler window and renaming them. The files themselves are encoded, and cannot be edited outside the Task Scheduler applet.
  2. In the properties of a task, you can see the command that is run with all the switches.
  3. You can remove the saved password that NTBackup asked you for in the task properties under "Run only if logged on". Unchecking this feature would enable the task to run if you logged out. While it is checked, it still runs while the workstation is locked.
  4. Scheduling a separate backup for each day of the week is accomplished by using "Schedule...Weekly" and making a separate task for each day of the week.
  5. Scheduling twice a day is accomplished by using "Schedule...Daily" and accessing "Advanced". Then set it to repeat. Example: "Start: 12:01pm Repeat: 6 hours Duration 6 hours 5 min"
  6. If you want to get a task to run while you go to lunch, you can change "Settings...Idle Time" to "Only Start if the computer has been idle for at least" and only "If the computer has not been idle for that long, retry for up to" the length of your lunch hour or more. You should start the task at an average of your lunch time.
  7. Forcing the task using any of the settings may corrupt your backup file.
  8. Note the "Don't start the task if the computer is running on batteries." This will interrupt your "Repeat task" if you have any.