kkpictures.com logo Keeping Data Safe  

§1. Introduction. Like most photographers I use a digital process for all my colour work. So although my originals are transparencies I scan them and prepare files for printing digitally. This gives me the best possible control over the final print. Just as importantly I have a high quality "copy" of the transparency if the worst happens, for this reason I scan to the highest possible resolution (giving me 16 bit per channel scans at around 460MB). Keeping these backup copies as safe as possible is obviously very important. (If you work digitally throughout then the matter is even more pressing.) This short article is about the solution I have adopted and why.
Given the large amount of data I need to keep I decided that my priority was to make sure that the raw scans are as safe as possible. By "raw" I mean exactly what I get from the scanner just with the profile embedded, no adjustments made at all and no conversion to any colour space. Given a well profiled scanner this represents the maximum information you could possibly have from it. Without a well profiled scanner the whole exercise is a bit pointless. Thus I decided that at any given time I will have one master copy on my system and one copy off site. This means bringing the disk in at regular intervals to synchronize it with the master copy. Doing this very frequently would be very inconvenient (and there is an increased risk of damaging the disk in transit. Once every two weeks seems reasonable, but if the master disk crashes quite few scans could be lost. In reality not a big problem since the likelihood of the disk crashing and the original transparencies being lost at the same time is extremely low. Even so scanning at very high resolution is time consuming and not a lot of fun. So I decided to keep a mirror copy of the master disk on the system and just swap that with the offsite one each time. Keeping a mirror copy is easy with the right software.
§2. The hardware. There is no shortage of hard disks for sale some at quite low prices. The actual disks are made by a very small number of manufacturers, most well known brands do not make their own disks; what they are selling is the housing with interface, cables and in some cases bundled software. In my case I needed at least four disks and in fact five. This is because I use one to backup general data from my computer, one to hold adjusted scans and the other three for the raw scans as described above. Separate disks in their own enclosures would have been impractical; all those external power supplies and wires. An enclosed drive with several disks was also not attractive, too inflexible and what happens when one fails? So the disks had to be removable individually and without fuss (for some systems you have to remove them from the back of the enclosure). I discussed things in detail with Neil Jones at rentaraid. His suggestion was to use the ProAvio StudioRACK S4. This takes four disks at a time, is front loading, and mercifully has an integrated power supply. The unit comes with the necessary cables (all high grade), though if you intend to use it with USB you will need to buy one more cable. In use you see four disks come up on the desktop (on a Mac). I bought an extra disk for off site purposes.
An extra advantage of this arrangement is that I am not tied down to any particular make of disk, if one fails I can simply swap it (all this requires is to unscrew the old one from the loading cradle, which comes out with the disk, and secure the replacement). I could even decide to put one of the disks in my Mac if higher data transfer speed was needed. The version I bough is with Firewire 800 so the speed is quite reasonable since at any time I am just loading one scan for editing. There is an eSATA version but I didn't want to get into buying an extra card for my Mac, and in any case the rate at which data can be got off one 7200RPM disk is not much higher than what Firewire 800 can handle (of course if more than one disk is accessed at the same time then there is a difference).
§2.1. Points to note about the StudioRACK S4. This unit consist of two pairs of disks. If connecting with firewire you can use the bridging cable (supplied) and then only one firewire connection to the computer. Of course it could be used as two separate drives to two different machines (so each machine just sees its pair of disks). For a USB connection you need to use two cables (but only one is supplied, a strange decision but in reality most users will use Firewire 800).
The disks are not hot swappable, i.e., you cannot just eject a disk and then pull it out. Doing this will result in an error message. While the message can be dismissed and there seems to be no obvious problem, pulling one disk out while the other one of a pair is still mounted is not recommended by ProAvio (my guess is that if there is data being transferred to the remaining partner disk it might be corrupted). Ejecting both disks means you can pull out one of them but the controller chip comes back on line after a brief pause and remounts the other disk. When a replacement disk is inserted this causes an error message again. All this is because each pair is controlled by one board and the two disks are essentially on the same Firewire bus line. It is possible to pull out both disks and reinsert without an error being signaled (by doing this before the chip comes back on line) but the simplest thing is to replace a disk when the unit is powered down.
I must stress that the preceding discussion is for information. In practice it is not a problem. Hot swapping is an issue if several users are being served by the unit and a disk needs to be replaced without interruption (e.g., as part of a RAID system). So I don't find it a sticking point. What is disappointing though is that the unit has no documentation (at least as shipped to me) and the online quick set up guide has two unnecessary and potentially misleading steps (though it is fairly obvious that they do not apply to this unit). There is very little to document but it would have been useful to have had points such as those discussed here stated in documentation rather than by explanatory emails (with help from Neil Jones).
The unit is flat with a fairly low profile which means that it can be put under the monitor, very useful. It is cooled by a couple of fans resulting is some noise but it is not particularly loud (if you have a Mac G5 the level is about the same as when that is running normally). I much prefer this to a design without fans and the consequent build up of heat (a steady temperature state is surely better for reliability). There are indicators if the temperature is too high or a fan fails. There is some vibration but again this is not excessive (and transmittance is reduced by the four supplied rubber feet). Beware, however, that if your desk surface is not totally solid you should not place your scanner on the same one. In my case the surface is 18mm mdf and because there is no solid support below a fair part of it, my scanner started to produce faulty results when used at the highest resolution. Moving the scanner to the next surface solved the problem. Alternatively the unit can be mounted on a rack (the appropriate attachments are supplied).
Note that power stays on regardless of whether the computer is on on off. A simple way round this is to use a product such as the intellipanel. This is useful to have in any case for use with other peripherals such as scanners or external speakers. It works well, e.g., it waits for long enough before switching power off to perpherals so you don't get any error messages. The only word of caution here is that if you tend to put your computer to sleep very frequently then it is probably best not to couple the rack with the computer (the switch for the rack is at the back but easy to use). This is because powering hard disks up and down too frequently makes a crash more likely. In practice I only put my computer to sleep if I will not use it for more than one hour (after all I want the internal hard disks to keep working for as long as possible).
The final and most important point is the solidity of this product. It is not just a tin box and inspires confidence. It is aimed at professional recording studios and this seems to me to be no mere marketing hype.
§3. The software. The hardware as described is self contained and doesn't need any extra software (all that is necessary after connection is to format the disks using the appropriate system utility). On the other hand synchronizing disks manually is both tedious and error prone. Neil suggested I try out ChronoSync. This turns out to be very well designed and flexible software which is reasonably priced. It can be used for many tasks from very simple to quite sophisticated. For me it proved even more attractive because I can use it to synchronize work between my laptop and desktop machine. Just keeping two disks synchronized is not so hard to do (e.g., with the underlying UNIX utilities on a Mac) but the features of ChronoSync bring many advantages and make it very unlikely that I will overlook the task. Of course the software can be used to make backups (of various kinds) so it comes in handy especially if you don't have Time Machine on your system.
§4. Rejected solution. At one stage my plan was to mirror the raw scans using a RAID 1 setup. This uses two disks and makes a copy of everything saved on one to the other at the same time. So at any given time they are identical. This is useful in situations where service should not be interrupted, if one disk fails the other is used and the data is also copied from it to the new replacement disk in the background. This is fine but one problem is that all of the data is copied each time and can take many hours. By contrast my need was for updated copies, i.e., the off site disk should only have the new scans written to it when installed. With regularly scheduled runs of ChronoSync my second copy is never very much out of date so at worst I'd have to rescan a small number of transparencies. This is a small (possible) price to pay for not having to rebuild a large disk every two weeks.
§5. Conclusion. I don't make my living from photography but I do spend a lot of time, money and effort on it. I therefore could not contemplate losing my hard won transparencies. The solution I have adopted is one that guarantees their safety, at least as high quality scans. I have not taken quite as many precautions with adjusted scans. Here I make local backups especially of any critical ones, e.g., for an exhibition or magazine article. If I were a professional photographer then I'd certainly take as many precautions with adjusted scans as with raw ones; a client isn't going to wait around for long when a disk crashes and time consuming adjustments have to be made.
I have been lucky in that I have never had a hard disk crash. However every disk will fail at some point and this can happen even when it is fairly new. Perhaps you have so far been lucky too, but you can be pretty sure that one day your luck will run out. Backups are only an optional extra if you do not mind losing all your data, otherwise they are a must. If at all possible you should keep an off site copy of essential data.
§6. Addendum. Since writing this article I have set up a similar system for a friend of mine who is a professional photographer. In his case we used the eSATA version with a basic card for his Mac Pro. Installation was very straightforward. There are some interesting differences between the two racks. Physically they look exactly the same (the same case is used) but the eSATA version does not have fans. It is also possible to eject the disks individually (it is mainly for this reason that my friend chose this version). In terms of speed of data transfer my impression (and it is no more than that) is that there is not a great deal of difference in practice, with the eSATA being perhaps a little faster when a single file is being transferred.
For this set up we used two master disks, one holds the Raw data and the other the adjusted versions. There are then two backup disks for each of the masters, an A and a B version. I wrote some scripts that automate much of the process. For example one script can be used to mirror the structure of the Raw disk onto the Processed one. Once a file is worked on and perhaps saved in a different format the copied version can be deleted and will not be recopied by the script. At any given time the rack has either the A pair of backup disks or the B pair and the other is off site. I have now bought an extra disk for my own system and set it up in exactly the same way.

Note: This article was written in 2010 so some things have changed but the principles still apply.