The cloud wants your junk data

What do you think about when you think about cloud?   A lot of people think of shiny, new technology made of all new APIs and hypervisors and mobile devices and cutting edge code and things that only the next generation will understand. And for a lot of cloud customers, that’s reality. New, new, new.

What you probably didn’t know, however, is that the storage part of the cloud service provider businesses aren’t hung up on new. In fact, they are ecstatic about old. Old junk data that you would rather forget about, get out of your life and out of your data center. Data that you know you shouldn’t just delete because an attorney somewhere will ask for it. But data that’s taking up expensive tier 1 storage that is the digital equivalent of engine sludge.

Cloud storage services want it – even if you end up deleting it later. It doesn’t matter to them.  You might be thinking they just want to mine your data.  Nope. They are perfectly fine storing encrypted data that they will never be able to read. To them, it’s all the same flavor of money at whatever the going rate is.  They don’t care if the data was a lot bigger before it was deduped or compressed or whatever you have done to it to reduce the cost. Why should they care if you send them 100 GB of data that was originally 1 TB. They don’t.

It’s good business for them – they’ll even replicate it numerous times to prevent data loss.  You might be thinking “but it’s garbage data, I’d never replicate it”.  True, but if it’s garbage, data then why do you have so many backup copies of it on tape and possibly in other locations?  Why are you managing garbage over and over again?

It’s a double win. They want it and you don’t. All you need is the equivalent of a pump to move it from your expensive tier 1 storage to their data storage services. There are a number of ways this can be done, including using products from StorSimple, the company I work for. A StorSimple system ranks data based on usage, compacts it, tags it (in metadata), encrypts it and migrates it to a storage tier in the cloud where it can be downloaded or deleted later if that’s what you decide to do with it. How much money do you think your company is wasting taking care of junk?

Dedupe is coming fast for primary storage – and the cloud will follow

With EMC’s acquisition of XtremeIO today the landscape for storage products appears to be destined to change again to include a new segment for all-flash arrays.  One of the technologies that will go mainstream along with flash arrays is primary dedupe. When you have all the read performance that flash provides, there isn’t any reason not to do it.  A number of smaller vendors including StorSimple, the company I work for and Pure Storage have been using dedupe already paired with flash SSDs.

Chris Evans, in his blog The Storage Architect, wrote a couple weeks ago about the potential future for primary dedupe, pointing out that Netapp’s A-SIS product has produced good results for customers since it was introduced in 2007. He then goes on to discuss the symbiotic relationship between flash SSDs and dedupe before posings the question about when dedupe will become mainstream for primary storage, saying

That brings us to the SSD-based array vendors.  These companies have a vested interest in implementing de-duplication as it is one of the features they need to help make the TCO for all SSD arrays to work.  Out of necessity dedupe is a required feature, forcing it to be part of the array design.

Solid state is also a perfect technology for deduplicated storage.  Whether using inline or post-processing, de-duplication causes subsequent read requests to be more random in nature as the pattern of deduplicated data is unpredicable.  With fixed latency, SSDs are great at delivering this type of read request that may be tricker for other array types.

Will de-duplication become a standard mainstream feature?  Probably not in current array platforms but definitely for the new ones where legacy history isn’t an issue.  There will come a time when those legacy platforms should be put out to pasture and by then de-duplication will be a standard feature.

In a post I wrote  last week  about  using deduplication technology for data that is stored in the cloud I described the benefits of dedupe for reducing cloud storage transaction and storage costs.  As the wheels of the industry continue to converge, it’s also inevitable that the systems that access cloud storage will also dedupe data. There isn’t any reason not to do it. The technology is available today and it’s working. Check it out.


A customer’s journey to real-world cloud storage

Dan Streufert, the IT Director for MedPlast recently presented at the  SNW Spring conference about his experience with a StorSimple enterprise cloud storage solution.  In this video he talks a lot about their requirements and what they were looking for cloud storage to do for them.  Dan’s approach to running IT is based on leveraging cloud services as much as possible and as he says, ” It lets us keep pretty lean and focused on what our business does best, which is making medical products, not IT.”

The importance of dedupe for cloud storage

There are times when smaller is definitely better – such as when you are squeezing through a slot canyon or trying to park your car in San Francisco or getting comfortable on a trans-pacific flight or moving data back and forth to the cloud.

Dedupe technology is one of the best ideas to hit the storage industry – ever. We habitually create an enormous amount of digital stuff that often involves copying stuff we already have.  That’s where a lot of the bloat in storage comes from – copies on copies on copies.  Dedupe finds redundant copies of data and reduces the number of times we store them. This means the amount of data we store is smaller.

Cloud storage costs have two origins: the cost of the transfer and the cost of storage. It only makes sense to ensure that data sent to the cloud is as “thin” as possible to reduce both transfer and storage costs.  Deduping data that is sent to the cloud can decrease the costs of transfers and storage as much as 90%.  It’s a no-brainer. But why would you wait to dedupe data until you are sending it to the cloud when you could dedupe it on primary storage on-site and take advantage of it’s efficiencies there?   That question leads to a whole other discussion that I’ll have soon in another post.

Assuming you have deduped data stored in the cloud, there are a couple things you can do with it. The first is recovering, or downloading, the data back to its original location.  Because it’s deduped, the transfer charges for the download will be much less and so will the time it takes to download.  It’s all good and if you are using the cloud for enterprise backup, it should be part of your best practices.

The second scenario is using the data in the cloud with applications running on cloud compute services. The deduped data there needs to be mountable by servers running in the cloud, which means there needs to be a way to rehydrate the data. (Rehydrating is the inverse process of dedupe.)  The storage equipment that deduped the data on-premises is not in the cloud, so a virtual storage appliance that is in the cloud can take it’s place. Data can be accessed through the virtual storage appliance or natively on cloud storage after it has been rehydrated.

Not all of this functionality for processing deduped data in the cloud is here today, but its going to be a part of the cloud storage ecosystem in the future.  People can start thinking about transferring data efficiently and securely between their data centers and the cloud storage vendor of their  choice for disaster recovery or cloud computing purposes.  Dedupe is not necessarily a key technology enabler, but it may very well be a key technology that makes moving applications to the cloud realistic.

Skimming content from Calvin’s blog – my homies!


Two of my buddies from HP, Calvin Zito and Michael Haag recorded a podcast at SNW last week.  The topic is Building Storage for Virtualization.  In addition to having good content, this podcast is very well recorded – good job Calvin.  Of course, the discussion moved to 3PAR systems and although I no longer work for HP/3PAR, I still like the 3PAR system and its architecture.  I’m glad that StorSimple does not compete with 3PAR because I have so many good friends over there.

At one point Calvin suggests that Michael sounds like JR (Jim Richardson, legendary 3PAR employee).  No, Michael you don’t, thank goodness.  I can’t imagine two JRs breathing the same air – something would have to give.

At the end of the podcast, they talk about storage efficiency for virtual servers. The two technologies mentioned were thin provisioning and deduplication.  While StorSimple is now becoming known for it’s cloud integration, I always like reminding people that our systems use both thin provisioning and deduplication technologies with primary storage.

A big day for us and a landmark day for cloud-integrated storage

I’ve been looking forward to this ever since I joined StorSimple at the end of January.  We are announcing new products, software and capabilities - but in addition we are helping to launch something much larger than that – a new category of storage products called Cloud-integrated Storage.  Below is a chalkboard animation describing how Cloud-integrated Storage can solve some of the chronic problems storage managers deal with including data growth, backup, archiving and DR.

I’m extremely excited about what we are doing and by the excitement our customers have for our products.  There is an awful lot of cool technology in our products. There are SSDs and SAS drives with automated storage tiering, there is primary storage dedupe and compression, enterprise-class, dual-controller, multi-pathing, non-disruptive software upgrades, application-aware snapshots for Exchange, Sharepoint SQL Server and Windows Server and certifications from Microsoft and VMware. StorSimple is the only cloud storage product certified by VMware.

Then there is the cloud part – which is really data management for cloud storage and where most of our secret sauce is. StorSimple Cloud-integrated Storage systems provide AES-256 encryption, support for multiple clouds, thin restores, cloud restores, cloudsnaps, cloud clones, cloud as a tier (CaaT) and volume prioritization for moving data to the cloud. It’s an incredible set of features that were designed to make using cloud storage safe and as simple as possible for storage managers and end users.

Here’s where you can find more information:

StorSimple Cloud-integrated Storage systems

Analyst reports, datasheets, white papers and videos



Some gigabytes are worth more than others

Getting clarity on the cost and relative worth of enterprise technology has always been a challenge because of the complex environments and diverse requirements involved. For every good question about which product is better, there is the almost universal answer – “it depends”.  One product might have more capacity than it’s competitors, while another might have a unique feature that supports a new application and another product might have a new operating or management approach that increases productivity.  Beauty is in the eye of the beholder and enterprise customers dig a lot deeper than what appears in competitors’ spec sheets. In some respects, it’s like comparing real estate properties where location and design trump square footage.

One of the traps people fall into when comparing the value of cloud services to legacy infrastructure technologies is limiting their analysis to a direct cost per capacity analysis. This article in Information Week did that in a  painstaking way where the author, Art Wittman, made a commendable effort to make a level cost comparison, but he left out the location and design elements.  He concludes that IaaS services are not worthwhile because the costs per capacity are not following the same cost curve as legacy components and systems.  There is certainly some validity to his approach – if the capacity cost of disk drives has dropped an order of magnitude in four years, why should the cost of Amazon’s S3 service be approximately 39% higher?

Conceding that productivity gains can be realized from cloud services, he limits their value to application services and summarily rejects that they could apply to IaaS. After all the work he had done to make a storage capacity cost comparison, he refused to factor in the benefits of using a service.  Given that omission, Mr. Wittman concludes there is no way for an IaaS business model to succeed.

I agree with Mr. Wittman in one respect, if a service can’t be differentiated from on-site hardware, then it will fail.  But that is not the case with enterprise  cloud storage and it is especially not true with cloud storage that is integrated with local enterprise storage. Here’s why:

Storage is an infrastructure element, but it has specialized applications, such as backup and archiving that require significant expense to manage media (tapes). Moving tapes on and off-site for disaster recovery purposes is time-consuming and error-prone. While the errors are usually not damaging, they can result in lost data or make it impossible to recover versions of files that the business might need. The cost of lost data is one of those things that is very difficult to measure, but it can be very expensive if it involves data needed for legal or compliance purposes.  Using cloud storage as virtual tape media for backup kills two birds with one stone by eliminating physical tapes and the need for off-site tape rotations. It still takes time to complete the backup job and move data to the cloud, but many hours a month in media management can be recaptured as well as tape-related costs.

There are even greater advantages available with backup if it can be integrated from primary storage all the way to the cloud, as it is with StorSimple’s cloud-integrated enterprise storage (CIES).  Using snapshot techniques on CIES storage, the amount of backup data generated is kept to a minimum, which means the amount of storage consumed from the storage cloud service provider is far less than if a customer used the cloud for virtual tape backup storage. Cloud-resident data snapshots have a huge capacity advantage over backup storage where the storage of files for legal and compliance purposes are concerned and it demonstrates how the design of a cloud appliance can deliver even more value from cloud storage.

The next increase in cloud storage value comes from integrating deduplication, or dedupe technology with cloud storage.  Dedupe minimizes the amount of storage capacity consumed by data by eliminating redundant information within the data itself. Sometimes, the amount of deduped data can be quite large – as occurs with virtualized systems. StorSimple’s CIES systems automatically applies dedupe to the data stored in the cloud and squishes capacity consumption to its minimum level – which also minimizes the amount of data that is transferred to and from the cloud. With the help of a cloud-integrated enterprise storage system, the capacity of cloud storage increases in value a lot because so much less of it is consumed.

But the worth of cloud storage is not all about consuming capacity, it’s about accessing data faster than you can from legacy data archives. Data stored in the cloud with a CIES system is online and can be accessed by workers and administrators without the need to find it in a separate archive pool of storage. If you don’t work in IT, you might not know how much time that can save the IT staff, but if you do work in IT, you know this is a huge advantage that returns a lot of administrator time for other projects.

The access to data in cloud storage is probably most valuable when it occurs following a disaster.  Cloud storage provides the ultimate flexibility in recovery by being location-independent.  Backup or snapshot data stored in the cloud can be accessed from almost any location with an Internet connection to the cloud storage service provider.  Again, cloud-integrated storage has some important advantages that further increase the value of cloud storage by requiring only a small subset of the data to be downloaded before application systems can resume production work. This is much faster than downloading multiple virtual tapes and then restoring data to application servers.

I could go on – and I will in future blog posts. This one is long enough already. There are numerous ways that cloud storage is worth more than it’s raw capacity.  Some of this worth comes from its role in disaster recovery but a lot of it comes from how it is used as part of an integrated storage stack that incorporates primary, backup, archive and cloud storage.