Archives for May 2015

Rubrik in the clear: clusters, clouds and automation for data protection


When I posted about Rubrik back at the end of March I didn’t expect it to be Part 1 of a 2-part blog.  But that’s blogging for you – and today Rubrik announced it’s product and a big B-round of funding ($41M).   I also know a lot more about Rubrik, the people behind the company and their technology so I can put together a better picture of them now.

For starters, I think they are doing something very big and executing very well.  The bigness is a matter of eliminating data protection complexity with automation and connectivity and the execution is a matter of sophisticated integration that does not over-reach. Unlike StorSimple, where I worked prior to Quaddra, Rubrik’s solution is not primary storage and doesn’t depend on customers migrating data from where it is to Rubrik’s platform. Managing data in-place is a big deal because it allows customers to try it without first disrupting their environment. The combination of doing something big and doing it well, gives Rubrik a chance to change the technology landscape. If it catches on, everybody else in the data protection business will be forced to do something similar to keep their customers, except it’s not as simple as adding a feature, like dedupe – it will require significant architectural changes.

kumbayaThe Rubrik solution is comprised of some number of nodes residing on a local network where they discover each other automatically as well as gaining access to a vSphere environment through a vCenter login. It uses VMware APIs to get both the configuration information needed to operate as well as for making backup copies of VM data. Their use of VMware APIs is discussed briefly in a blog post by Cormac Hogan. In addition, Rubrik is also developing their own API so it can someday be operated under the control of whatever management system a customer wants to use. This is exactly the technology approach business managers want for controlling their organization’s storage costs. It is designed to be a good interoperator. (I know, interoperator is not a real word, but shouldn’t it be?) It’s an interesting thing to ponder: Rubrik’s system software replacing backup software, but Rubrik’s software becoming transparent under the operation of some other orchestrator or software controlled something-or-other.

Rubrik’s cluster nodes have local capacity for protecting data as well as software that manages all operations, including using cloud storage for tertiary storage. Data that is copied to the Rubrik cluster is deduped upon arrival. Considering that they are copying data from VMs, usually having many of the same operating system files, they will likely get decent dedupe ratios. Data placement and retention policies for secondary and tertiary storage are managed through a web interface. There are several editable backup retention templates that can be used to approximate a customer’s own data retention policies. Rubrik will likely learn a lot more about data retention then they expect in the years to come from customers who want all possible options. Nonetheless, they have done a decent job coming up with a base set.

Rubrik’s policies include settings for when data can be moved to the cloud and when it should expire. Initially they support Amazon’s S3, but they indicate other options are in the works. I believe Rubrik’s solution mandates cloud storage and a cloud storage account, but there can be exceptions to every rule. The important thing to recognize here is that policies for data snapshots can be extended in ways that on-array snapshots can’t, simply due to the enormous capacity available in the cloud. There are still going to be bandwidth issues to deal with getting data to the cloud, but Rubrik clusters are scale out systems with their own “expanding bag” of storage hitched to a powerful dedupe engine. As newcomers to storage, they seem to have figured out a few things pretty well.

Rubrik has come up with interesting recovery/restore capabilities too. Rubrik contends that a primary array shouldn’t have to provide snapshot volumes when they could be mounted instead from the Rubrik backup cluster. They point out that there is no reason to restore secondary volumes to primary storage when they could be more easily and quickly be mounted from the Rubrik system. In addition, volume clones can also be created in the Rubrik cluster and mounted by whatever VMs need to access the data. There is a bit of a throwback here to EMC’s TimeFinder and I expect that Rubrik will find customers that justify purchasing their solution in order to simplify making and managing clones for DevTest. FWIW, I don’t see Rubrik’s volume restore capabilities working for large scale DR scenarios because of the massive resources needed by large scale recovery efforts. This is an area Rubrik will be educated about by their customers in the next couple years.

Rubrik talks about their technology as consolidated data protection, which seems apt to me because I believe they have a chance to permanently alter the worlds of backup, snapshots and recovery. But that’s a tall order and there are some red flags to consider as they make their way in the market: 1) Rubrik’s solution appears to be a vSphere-only solution, which will exclude them from some number of opportunities; 2) Rubrik has now raised $51M, which has it’s advantages, but it also means they need to start having sizable successes sometime too. 3) The weather forecast calls for uncertain storms when things don’t work quite as advertised or when – worse – a customer loses data. It’s not impossible or unthinkable – it’s the storage business. 4) combined with #3, they will have to learn a whole lot of things they don’t know yet about storage customers and the number of practices they have that can complicate otherwise “perfect” technology.