Who Protects Your Cloud Data?
January 13th, 2008 (7:13am) Mike Gunderloy 21 Comments
Back in April, we speculated about one of the hidden dangers of depending on web services to store your data: the possibility that no one was doing backups. Now that possibility may have turned to reality for users of Omnidrive (once touted as the “clear leader” in the online storage field by TechCrunch). The service has been offline for some days, with its servers currently not responding at all. A December article at ReadWriteWeb contains serious allegations of fraud from the company’s ex-CTO (as well as a defense from the CEO).
My sympathies at this point are with Omnidrive’s users, particularly those who have their only copies of documents on an unreachable server. I can think of plenty of times when a days-long outage (let alone a permanent loss) of my own document storage would be devastating. The larger question, though, is what you as a user can (or should) do about this? Online document storage is certainly attractive to the web worker; being able to access and share your work easily in any browser is definitely a killer feature. But how do you balance that off against the fact that your documents could simply vanish overnight?
One possible approach is simply to choose your storage vendor very carefully. Backup vendor Mozy, for example, is owned by giant EMC, Jungle Disk uses your Amazon S3 account for storage (so your data will be available even if Jungle Disk itself goes under), and Google Documents is, well, Google. Some smaller vendors have their own serious backup policies to guard against hardware failures.
Yet in a world of imperfect hardware and software, as well as regulatory and legal issues, choosing one company for storage is still ultimately a gamble. It may be unthinkable that an EMC or Amazon or Google could fail, but it’s not impossible. No matter how carefully you choose, entrusting your data to a single online storage vendor is the equivalent to storing it on a single hard drive: it introduces a single point of failure into the system.
For hard drives, of course, we’ve long had several answers to this problem: backups or RAID. If disks are unreliable, make a copy of the data elsewhere. If one disk is unreliable, store your data on three or five or seven disks, with a scheme that allow perfect data recovery even if one or two disks should suddenly be reduced to iron filings by hardware failures. What the disappearance of Omnidrive suggests to me is that it’s time for the next step in the evolution of online file storage, now that there is more than enough competition in the market for simple storage. We need the online equivalent of backups and RAID.
This doesn’t mean that the online storage services need to use backups and RAID on their servers; that’s irrelevant to me as a consumer in providing protection against vendor failure. Rather, I’d like to see products that automatically back up, say, a Box.Net account to Amazon S3 storage. Or an API that writes copies of my data simultaneously to Amazon and the fabled GDrive, and allows retrieval from either service if the other is missing. Or even a way to mirror my online storage, overnight, down to a desktop drive for safekeeping.
Until products like these are available (and if I’ve just missed them, please let me know in the comments), storing your documents online will remain a gamble. Perhaps a safe gamble, but it could be made far safer with more vendor independence.



21 Comments Post your own comment
Ryan Goldman says: January 13th, 2008 7:55am
This is the Achilles’ Heel of online data storage. Once it’s off my computer (or my network) I no longer have control over my data.
Irate ex-worker decides to take a few servers down on the way out the door? Local internet provider suffers a service outage? Even planned maintenance downtime can interfere with our ability to retrieve our files and get our work done. And if we depend on web-based apps we can lose the very tools needed to work with our documents.
As convenient as online data storage and web apps can be, they introduce too many (uncontrollable) points of failure to be relied upon as a primary solution.
Andrew Biss says: January 13th, 2008 8:35am
“Cloud RAID” is a good idea and I would certainly look with interest at someone offering such a solution.
One thing to be very careful about, however, is which vendor is supplying the physical backend storage service for the multiple RAID providers.
Why? The company you are paying for your web hosting is probably outsourcing the actual hardware infrastructure to another (larger, more efficient) back-end supplier. This will increase in future as economies of scale lead to fewer, but larger, utility computing centers.
In the “Cloud RAID” model there must be careful control that you are not contracting with what appear to be a number of independent RAID storage providers, but when we look at the physical implementations it turns out that one or more of these providers is actually being hosted by the same back-end storage utility.
We would then be running same risk as before! One hardware failure (or network outage) could lead to data loss. What is worse, though, is that by using the RAID approach we have a false sense of security about our data.
I have been using Jungle Disk for offsite backups since November 2006 without any problems. I am confident in the reliability of Amazon S3 for this. Even so, it would be nice to have at least some idea of:
(1) How many separate copies Amazon is storing of my data
(2) Where my data is (roughly) located.
At the moment my offsite backups are being stored somewhere in the “Amazon cloud”. Where, exactly, I have no idea …
s3box says: January 13th, 2008 9:46am
So far, Amazon S3 has been most reliable.
Reuven Cohen, CTO Enomaly says: January 13th, 2008 10:06am
ElasticDrive allows you to configure a “cloud raid”, where data can be written to several remote storage systems at the same time (S3, Nirvanix etc,) as well your local disk. Check out http://www.elasticdrive.com
Geva Perry says: January 13th, 2008 10:59am
You don’t need to look at a relatively small vendor like OmniDrive. Remember the repeated outages Salesforce.com had a few months back? This is mission-critical information for a lot of people.
That said I would still argue that keeping your data on the cloud is many times safer than on your local hard disk.
Perhaps we need services such as Pingdom that measure and rank the various cloud storage providers in terms of reliability and up-time.
YDRIVE says: January 13th, 2008 11:11am
YDRIVE will let you select your own and/or preferred backend storage provider. Available soon.
Rick says: January 13th, 2008 11:37am
There’s a difference between a backup storage provider and one where you’re creating the master copies of your documents in the cloud. If the online storage service is merely being your backup provider you’ve lost nothing if they go away – you just have to find a new online backup service. A hassle, maybe, but the risk is minimal for a week or so if you don’t backup and there are several other options out there.
If you’re creating master copies of your documents in the cloud… well you STILL need to have a backup strategy. Not so much because your files might be stored on one drive and lost, but because they might become inaccessible. It’s the exact same issue as backing up local data – your data is all in one place, what if that place suddenly is not accessible?
thorgersen says: January 13th, 2008 12:48pm
I use Mozy as well as daily (or more often) backups to a removable disk drive, stored onsite. Doubtful (though not impossible !!!) that the computer,onsite backups, and offsite backups are all inaccessible at the same time.
K. T. Stevenson says: January 13th, 2008 2:02pm
Using online storage providers for primary storage is asking for trouble. Even if it is backed by a large company, priorities change and that large company may decided to close down the service. I use online storage providers to store encrypted off-site backups and nothing else.
This issue applies more broadly to Web 2.0 applications in general. If my business relies on a web-based service, what happens if that service goes out of business? Always create a contingency plan and always create your own (local!) backups of any data stored on the web.
JungleDave says: January 13th, 2008 3:52pm
Amazon doesn’t disclose a lot of internals about S3 for security and competitive reasons, but they have stated before that all data is stored in at least 3 different datacenters in at least 2 geographic areas (e.g. east/west/central). They are pretty serious about data security as well as availability.
angry omnidrive user says: January 13th, 2008 8:57pm
stupid jerk, where’s that liar?
Mike Stankavich says: January 14th, 2008 6:29am
Maybe I’m paranoid or a pre-web Luddite, but I can’t stand the thought of not having an offline backup of any data that has personal or business significance to me. Vendors going out of business and extended connectivity outages are risks that I am not willing to accept.
External drives are so cheap now that there’s really no excuse left to not have everything backed up at least two or three times locally. My local Costco has 250gb Western Digital passport drives for $139.99.
At some point in the future I may consider using cloud storage as an additional form of offsite redundancy. But for now, I just rotate two external drives between home and a locked file cabinet at my office.
JPack says: January 14th, 2008 7:46am
Great article… I was one of those angry mis-treated omnidrive users for a long time, but I was lucky and copied/removed all my data from Omnidrive and was able to get a refund after constant persistence about 2 days before they went down!
Now I have moved to using JungleDisk with Amazon, and so far I am very pleased. I have much more confidence in Amazon, but I plan on using a secondary source for backup as well just in case (probably Mozy). Not to mention daily back-ups to a NAS just in case. I also keep a box.net account for my Word/Excel documents. I have been with Box.net for over 2 years and I have had zero problems, but I don’t use them for my bigger backups.
I learned the hard way, others should not have to.
bsdguru says: January 14th, 2008 10:36am
One thing that is a bit annoying for me is that both bingodisk and strongspace are down at the moment. Apparently Joyent are currently having issues with zfs. Basically means users cannot access their backups or static content at the moment.
Omnidrive Back Up Comment says: January 14th, 2008 8:48pm
No Responses to Website Server Back Up
Your User Says: Your comment is awaiting moderation.
January 14th, 2008 at 8:34 pm
No RAID? So unreliable!
In response to:
http://www.omnidrive.com/blog/2008/01/13/website-server-back-up/
craigbbaker says: January 15th, 2008 2:36am
Why is it necessary to change an IP address to fix a failed hard drive? One wonders!
angry omnidrive user says: January 15th, 2008 3:19am
So that someone can pull the reason of some “high TTL” excuse.
Anyway, did they change the IP address? It’s the same 75.126.5.64 before and after.
Rajeev says: January 15th, 2008 4:13am
I didnt know there was no inbuilt Backup option.
http://tekno-world.blogspot.com
Marc Reidy says: January 15th, 2008 5:57am
Perhaps this is more for medium/large business but they really need to take responsibility for their own data. There are plenty of web based file systems our there (I’ll not mention my own…just click the link :) and really thats what businesses should be looking at if they have no confidence in outsourcing their data storage.
Brian Carnell says: January 16th, 2008 8:33am
Mike Stankavich wrote:
“Maybe I’m paranoid or a pre-web Luddite, but I can’t stand the thought of not having an offline backup of any data that has personal or business significance to me. Vendors going out of business and extended connectivity outages are risks that I am not willing to accept.
External drives are so cheap now that there’s really no excuse left to not have everything backed up at least two or three times locally. My local Costco has 250gb Western Digital passport drives for $139.99.”
Something like Amazon S3 with multiple backup sites is *far* less risky than your system of backing up to external HDs.
Don’t get me wrong, I also back up to external HDs and then swap between home and office. I also regularly back up my data files to optical media.
But Amazon is likely much more reliable than both of those, especially in the case of natural disasters such as Hurricane Katrina where having multiple backups in different locations isn’t all that helpful if the entire area turns into a disaster zone.
albuquerque security guard company says: May 5th, 2008 10:34am
Data storage is really important but often overlooked.