Rubrik in the clear: clusters, clouds and automation for data protection

polyclouds

When I posted about Rubrik back at the end of March I didn’t expect it to be Part 1 of a 2-part blog.  But that’s blogging for you – and today Rubrik announced it’s product and a big B-round of funding ($41M).   I also know a lot more about Rubrik, the people behind the company and their technology so I can put together a better picture of them now.

For starters, I think they are doing something very big and executing very well.  The bigness is a matter of eliminating data protection complexity with automation and connectivity and the execution is a matter of sophisticated integration that does not over-reach. Unlike StorSimple, where I worked prior to Quaddra, Rubrik’s solution is not primary storage and doesn’t depend on customers migrating data from where it is to Rubrik’s platform. Managing data in-place is a big deal because it allows customers to try it without first disrupting their environment. The combination of doing something big and doing it well, gives Rubrik a chance to change the technology landscape. If it catches on, everybody else in the data protection business will be forced to do something similar to keep their customers, except it’s not as simple as adding a feature, like dedupe – it will require significant architectural changes.

kumbayaThe Rubrik solution is comprised of some number of nodes residing on a local network where they discover each other automatically as well as gaining access to a vSphere environment through a vCenter login. It uses VMware APIs to get both the configuration information needed to operate as well as for making backup copies of VM data. Their use of VMware APIs is discussed briefly in a blog post by Cormac Hogan. In addition, Rubrik is also developing their own API so it can someday be operated under the control of whatever management system a customer wants to use. This is exactly the technology approach business managers want for controlling their organization’s storage costs. It is designed to be a good interoperator. (I know, interoperator is not a real word, but shouldn’t it be?) It’s an interesting thing to ponder: Rubrik’s system software replacing backup software, but Rubrik’s software becoming transparent under the operation of some other orchestrator or software controlled something-or-other.

Rubrik’s cluster nodes have local capacity for protecting data as well as software that manages all operations, including using cloud storage for tertiary storage. Data that is copied to the Rubrik cluster is deduped upon arrival. Considering that they are copying data from VMs, usually having many of the same operating system files, they will likely get decent dedupe ratios. Data placement and retention policies for secondary and tertiary storage are managed through a web interface. There are several editable backup retention templates that can be used to approximate a customer’s own data retention policies. Rubrik will likely learn a lot more about data retention then they expect in the years to come from customers who want all possible options. Nonetheless, they have done a decent job coming up with a base set.

Rubrik’s policies include settings for when data can be moved to the cloud and when it should expire. Initially they support Amazon’s S3, but they indicate other options are in the works. I believe Rubrik’s solution mandates cloud storage and a cloud storage account, but there can be exceptions to every rule. The important thing to recognize here is that policies for data snapshots can be extended in ways that on-array snapshots can’t, simply due to the enormous capacity available in the cloud. There are still going to be bandwidth issues to deal with getting data to the cloud, but Rubrik clusters are scale out systems with their own “expanding bag” of storage hitched to a powerful dedupe engine. As newcomers to storage, they seem to have figured out a few things pretty well.

Rubrik has come up with interesting recovery/restore capabilities too. Rubrik contends that a primary array shouldn’t have to provide snapshot volumes when they could be mounted instead from the Rubrik backup cluster. They point out that there is no reason to restore secondary volumes to primary storage when they could be more easily and quickly be mounted from the Rubrik system. In addition, volume clones can also be created in the Rubrik cluster and mounted by whatever VMs need to access the data. There is a bit of a throwback here to EMC’s TimeFinder and I expect that Rubrik will find customers that justify purchasing their solution in order to simplify making and managing clones for DevTest. FWIW, I don’t see Rubrik’s volume restore capabilities working for large scale DR scenarios because of the massive resources needed by large scale recovery efforts. This is an area Rubrik will be educated about by their customers in the next couple years.

Rubrik talks about their technology as consolidated data protection, which seems apt to me because I believe they have a chance to permanently alter the worlds of backup, snapshots and recovery. But that’s a tall order and there are some red flags to consider as they make their way in the market: 1) Rubrik’s solution appears to be a vSphere-only solution, which will exclude them from some number of opportunities; 2) Rubrik has now raised $51M, which has it’s advantages, but it also means they need to start having sizable successes sometime too. 3) The weather forecast calls for uncertain storms when things don’t work quite as advertised or when – worse – a customer loses data. It’s not impossible or unthinkable – it’s the storage business. 4) combined with #3, they will have to learn a whole lot of things they don’t know yet about storage customers and the number of practices they have that can complicate otherwise “perfect” technology.

 

Will Rubrik’s time machine fix the mess of data protection?

0325150457fA couple weeks ago on the Speaking in Tech podcast I had a deja vu experience when the topic became a statement attributed to Microsoft that “backup software deserves to die.”   This came from an article by Simon Sharwood in The Register that quoted Microsoft as saying “If cloud storage had existed decades ago, it’s unlikely that the industry would have developed the backup processes that are commonly used today.”  Shazaam! The exact words I wrote in Chapter 2 of “Rethinking Enterprise Storage”  when I was on the StorSimple team at Microsoft.  It turns out that my former colleagues are publishing excerpts from my book in a Microsoft blog which caught Simon’s attention and he published his reaction. All understandable, but the context it was written in was left far behind.  As the conversation played out on the podcast, I found myself trying to cut Microsoft slack for what I had written, which morphed into a into a discussion about how I killed backup. A small bit of tech comedy, and all good.

0325150456Nonetheless, It’s inevitable that backup will be overhauled by cloud technology for the reasons I pointed out in the book: it’s far too complex, has too many points of failure and doesn’t scale. Cloud technology can fix most of those.

Lo and behold, just yesterday, another new startup, Rubrik, made their funding known to the world along with their intention of building a “Time Machine for the Enterprise”  Because it’s early and not much is known about them, I’m trying to read between the lines and guess what they are doing.

The most useful information was posted on Storage Review by Adam Armstrong, who describes Rubrik’s technology as a platform and used the graphic below to summarize the problem Rubrik intends to solve. Rubrik graphic

The most interesting thing about their graphic is it’s similarity to one that we used at StorSimple, as shown below on the left.  There are certainly differences, but the basic concept is the same: get rid of unnecessary, overlapping legacy systems.

storsimple consolidation

Rubrik apparently hasn’t provided the graphic for how their stuff replaces the legacy stuff, but I bet it will look similar to the right side of the StorSimple graphic with cloud (or object)  storage having a prominent role for off-site data protection.

The following phrases in Adam’s article hint at what else Rubrik may be doing:

“Eliminates the need for backup software or globally deduplicated storage”

Eliminating the need for backup software implies knowing about data that is being written, which means Rubrik will be in the data path – either as a storage device or some sort of cache. I suspect some sort of scale-out cache because they didn’t say they were developing a new storage system. There are a number of problems involved with fronting storage systems and nobody has been very successful doing it. It will be interesting to see what Rubrik is actually doing, but I’m sure they will be keeping a lot of metadata.

Rubrik founder Arvind Jain was at Riverbed and knows dedupe technology as well as anybody.  I’m not sure what “eliminating globally deduplicated storage” means, but I suspect it just means they dedupe data.  If it means they are trying to solve large-scale dedupe across many systems or even sites, they have to deal with very large hashes and hash tables which can have a big impact on performance.  But the performance problems of big hashes aren’t all that big if you are mostly working with secondary copies of data. That said, if they are making a caching thingy their method of deduping data will be interesting.

“Instant recovery down to the file level, where Rubrik claims is much faster than legacy backup solutions”

Yes, recovering files from disk or cloud storage is usually faster than recovering from tape, if that’s what they mean by “legacy backup solutions”.  Data has to be located on tapes and then tapes have to be located wherever they are stored. It’s not the fastest way to do it. It sounds like Rubrik is giving users an interface to historical versions of their files so they can recover their files themselves. That should be a good thing as long as they don’t recover shedloads of data without telling anybody, creating capacity problems. (There’s always something).

“Rely less on snapshots and more on their backup appliance”

The word appliance infers hardware, although it could certainly be a virtual appliance too. There would be value in having the equivalent of snapshots that don’t consume space on primary storage. There are a lot of ways this could be done. StorSimple does them with something called cloud snapshots – which also serves as daily offsite protection.

“Shows the IOPS throughput and speed as well as remaining storage capacity”

Presenting storage metrics is always a good thing – especially if you are operating as a cache.  Showing remaining storage capacity could apply to their platform or to downstream storage systems.  In either case, its good to have a handle on how full storage is so you can take action.

“Leveraging the latest hardware technologies including flash”

What storage platform these days does not leverage flash?  If you are doing a lot of dedupe you better be. This is not really an advantage as much as it is a requirement.

My take

Storage plays are always a lot harder than they appear, but Rubrik’s team appears to have sufficient knowledge about how storage works to have a shot at success.  Moving secondary data to the cloud or object storage makes perfect sense. The trick in eliminating backup software is circumventing all the best practices that are in place. Human habits are hard to change and Rubrik’s biggest challenge may be getting everybody on board with new ways of managing data that run counter to the old ways of doing it. For example, things like virus scanning and defrag can turn a storage cache inside out.

 

Not dead yet, but when will you get rid of tape?

Do you have any more tapes you want to get rid of?

People have predicted the ending of tape as a storage medium since the first rotating storage drums were made by wrapping recording tape around modified washing machine drums. Too cumbersome and too error prone, tape has survived because people use it for archiving and off-site DR storage. It has always been the storage backstop for all the other things that can go wrong – from human error to combinations of calamities that are stranger than fiction.

But tape itself has been a big problem. It is a byzantine technology with impressive data fast transfer rates, but is saddled by cumbersome management that requires many touch points where things can go wrong. Restoring from multiple tapes is time consuming and unnerving, but considered normal. Contrast that with using dedupe technology that can access and restore  data much more quickly.  The main problem with dedupe is it’s cost. The most popular disk-based dedupe systems are not necessarily cheap. The other problem is that many customers still use tape with dedupe for DR purposes. Used this way, tape it is less intrusive, but it still is a pain.

Disk-based dedupe has taken a big bite out of tape’s business, but yet tape has continued limping along like an unkillable zombie. Now with cloud backup looking like it could take even more out of tape’s market, is tape going to finally keel over?

Tape is tired

 

Putting tape backups on less expensive virtual tape cloud storage could look like an obvious solution, but like all things in storage, initial impressions are usually misleading. While cloud storage can be made to look like a big disk or tape drive in the sky, it is much slower than the old frenemy tape. The difference is most pronounced when you want it to be most transparent – during restores. Technologies for data reduction, such as deduplication and compression help, but the fastest restores from the cloud will use technologies like thin restores that were developed by StorSimple. Why restore data that you probably won’t need again? Just leave it in the cloud.

But getting back to tape, the cloud industry is making enormous investments in service offerings, including storage services, which will continue to be improved and expanded.  The cloud service providers are not stupid. They want your data so they can get your computing business when you are ready to start doing that.

Tape technology vendors do not have the marketing muscle to protect their install base, regardless of how entrenched those customers may appear. The fact is, only the largest IT shops have the resources to “do tape” well. Everybody else struggles with the stuff and will happy to jettison it as soon as they can.

So will tape disappear completely if most of the market goes away? Probably not, for starters cloud storage service providers will probably use a lot of tape, and large customers that know how to make it work will continue to want it.

My guess is that tape will follow the path of mainframe technologies into the mostly invisible corners of the technology industry where vendors are few and margins are high. Tape won’t die, it will only seem like it did.