Tuesday, February 14, 2012

How to Test Your Backup Strategy: Five Simple Questions

Backup strategies are like rashes: people tend to ignore them for as long as they can, then they ask a specialist if it’s going to be OK.

There’s a lot of good ways to back up databases. The challenge is finding best strategy for your data. To find out if your SQL Server backup strategy measures up, ask yourself the following questions:

1. How Do You Know When Your Backups Aren’t Successful?

People deal with lots of databases, and it’s hard to keep track of every last one. Different strategies are used to manage backups for all these databases — sometimes a service external to the database runs the backup. Do you know when that service fails, or doesn’t start? At other times, individual SQL Server Agent jobs handle backups for each database. If a new backup is added and a backup job isn’t created, will you know it wasn’t run?
  • Make sure you have an alert system for failures.
  • Supplement your backups with regular checks of last backup date. Make sure you’re checking the last date of log backups if you’re not using the simple recovery model.

2. How Much Do You Lose if Even Just One Backup File Goes Bad?

Backups take up a lot of space. There are a lot of backups, and people rarely use them. It’s human nature to go a little cheap, and keep them only in one place. That place may have been provisioned using RAID 0. Wherever it is, a backup file might get deleted or corrupted.
You need to look at all the different types of backups you run and consider things like this:
  • If you’re using just full and transaction log backups, the chain ends at the last good log backup.
  • If you’re doing full backups once a week and differentials on weeknights, those differentials are no good without the full.
For critical databases, I prefer to keep backups in more than one place— onsite and offsite. Think you can’t afford an offsite location? Here are the rates for Amazon S3.

3. How Fast Do You Need to Restore Your Data?

If you’re a manager, this is your ‘Recovery Time Objective’. If you’re a project manager, you just call this ‘RTO’. If you’re a database administrator, this is the number of minutes until you’re pumped full of adrenaline and sweating heavily when something goes wrong. Whenever I’m considering a backup technology, I want to know how long it will take me to restore my data if everything is gone except the backup files.
This means you need to know:
  • How long will it take to get to the backup files? If they’re on tape or in a magical Data Domain device, can you access them yourself?
  • How long will it take to copy those files to a good place to restore them from?
  • If you don’t have a server to restore to, how long will it take to bring one up and configure it?
  • How long will the restore take?
If my backup technology can’t meet my Recovery Time Objective, I need to start thinking about high availability options that can mitigate my risk. Once you’ve got high availability in place, you still want to keep a plan to restore from backups and test it periodically, but you’re less likely to need to use it.

4. How Much Data Can You Afford to Lose if Your Transaction Log File is Vaporized?

When you stop and think about it, isn’t ‘Recovery Point Objective’ a strange way of saying how much data it’s OK to lose? When you think about backup strategies, you need to know:
  • Can you do incremental restores? (And have you tested it?)
  • Can you restore to a single point in time? (Example time: right before the data was truncated. And have you tested it?)
  • If your database and transaction log files disappeared and you were left with only the backup files, how much data would be lost?

5. How Much Can Your Backup Impact Performance?

Whenever your backup is running, you’re burning precious resources: CPU, disk IO, memory, and (depending on your configuration), network. In many cases, it’s just fine to use this up during non peak times. However, for VLDBs which are backing up multiple terabytes and for critical OLTP databases which serve customers around the globe, backups need to be run as fast as possible. After all, backups are only one part of maintenance.
When your performance is critical, these are the questions you need to ask:
  • Are you backing up at the right frequency? If we’re talking about a slow transaction log backup, check how often it runs.
  • Are you using all the available magic? If you’re using SAN storage, have you explored all the options for backing up your databases from the SAN?
  • Are you using backup compression? Compression can burn more CPU, but reduce the amount of writes and overall duration of your backup.
  • Does your backup have enough throughput to the storage device? Whether you’re using iSCSI or Fiber Channel, throughput is often a big part of the solution.
  • Have you read the SQL CAT Team’s Case Study on backing up VLDB’s? It’s chock full of configuration tips to make your backups blaze through faster.

That was More than Five Questions

It’s true, I cheated. Got your own backup test? Tell me in the comments!

Labels: , , , , , , , ,