The cloud wants your junk data

What do you think about when you think about cloud?   A lot of people think of shiny, new technology made of all new APIs and hypervisors and mobile devices and cutting edge code and things that only the next generation will understand. And for a lot of cloud customers, that’s reality. New, new, new.

What you probably didn’t know, however, is that the storage part of the cloud service provider businesses aren’t hung up on new. In fact, they are ecstatic about old. Old junk data that you would rather forget about, get out of your life and out of your data center. Data that you know you shouldn’t just delete because an attorney somewhere will ask for it. But data that’s taking up expensive tier 1 storage that is the digital equivalent of engine sludge.

Cloud storage services want it – even if you end up deleting it later. It doesn’t matter to them.  You might be thinking they just want to mine your data.  Nope. They are perfectly fine storing encrypted data that they will never be able to read. To them, it’s all the same flavor of money at whatever the going rate is.  They don’t care if the data was a lot bigger before it was deduped or compressed or whatever you have done to it to reduce the cost. Why should they care if you send them 100 GB of data that was originally 1 TB. They don’t.

It’s good business for them – they’ll even replicate it numerous times to prevent data loss.  You might be thinking “but it’s garbage data, I’d never replicate it”.  True, but if it’s garbage, data then why do you have so many backup copies of it on tape and possibly in other locations?  Why are you managing garbage over and over again?

It’s a double win. They want it and you don’t. All you need is the equivalent of a pump to move it from your expensive tier 1 storage to their data storage services. There are a number of ways this can be done, including using products from StorSimple, the company I work for. A StorSimple system ranks data based on usage, compacts it, tags it (in metadata), encrypts it and migrates it to a storage tier in the cloud where it can be downloaded or deleted later if that’s what you decide to do with it. How much money do you think your company is wasting taking care of junk?

Are you feeling lucky, or just confident?

 

 

 

 

Chris Mellor wrote an article for The Register yesterday on cloud storage.  At the end of it all, Chris malappropriated the famous soliloquy from the movie Dirty Harry:

“Being this is a .44 Magnum, the most powerful cloud storage service in the world, and would blow your SAN head clean off, you’ve got to ask yourself one question: ‘Do I feel lucky? “Well do ya, punk?” ®

For those unfamiliar with the movie, the context here is that a violent detective (Dirty Harry) has caught a psychotic serial killer and asks him the ultimate question about his fate.  Tension builds with the realization that Harry is asking himself the same question because he is unsure if there are any bullets left in his gun.  He obviously wants to find out, but struggles with a good cop vs evil cop dichotomy. He needs his psychopathic adversary to make the first move, but he seems awfully confident.

It doesn’t have much to do with cloud storage, other than suggesting the question of fate – something that storage administrators think about with regards to data more often than they think about their own.

So what is the fate of data stored in the cloud and what sorts of steps do cloud service providers take to give customers re-assurances that theirs is safe? You can’t plan for everything, but you can plan to cover an awful lot of mayhem that can occur.

For starters you can store data in multiple locations to protect from being unable to access data from a single cloud site. As Chris’ article pointed out, StorSimple allows customers to do that. They can store data in separate discrete regions run by a single service provider or they can store data in cloud data centers run by different cloud service providers. Different customers will have different comfort levels where cloud redundancy is concerned.

But it’s important to know that cloud storage service providers already store data in multiple locations anyway to protect against an outage at a single site that could cause a data loss. Data in the cloud is typically stored multiple times at the site where it is first uploaded and then stored again at other sites in the cloud service provider’s network.  Customers who are concerned about the fate of their data should discuss how this is done with the storage service providers they are considering because they are all a little different.

There is an awful lot of technology that has gone into cloud storage. We tend to think of it like a giant disk drive in the sky, but that is only the easiest way to think about it.  Cloud storage – especially object storage in the cloud, the kind StorSimple uses and the stuff based on RESTful protocols has been amazingly reliable. There have been other problems with different aspects of the cloud, including block storage, but object storage has been rock solid.  It’s not really about feeling lucky as Dirty (Chris) Harry suggested, it’s about the scalable and resilient architectures that have been built.

We would love to talk to you about cloud storage and how you can start using it. If you have a cloud service provider in mind, we are probably already working with them.

How much data do you need to keep around?

When we think about data growth we tend to think about all the new data that is being created all the time. But what happens to new data? It gets old, like everything else in this world.

Then what? Usually nothing. It just stays there, taking up capacity. Capacity that could be used for new data. We all know how that turns out. We end up buying more storage and then we start the cycle all over again.

But what if you had a way to deal with old data so it didn’t take up so much capacity? What if you could just get rid of old data by putting it somewhere else – deduped and compressed? What if you could still access that data, just like you always did before? What if you didn’t have to buy a lot of equipment to make it work?

Do you think that would help you manage capacity?

If you like what you’re reading, you should really check out StorSimple, the company I work for. We have amazing new technology our customers love.

Some gigabytes are worth more than others

Getting clarity on the cost and relative worth of enterprise technology has always been a challenge because of the complex environments and diverse requirements involved. For every good question about which product is better, there is the almost universal answer – “it depends”.  One product might have more capacity than it’s competitors, while another might have a unique feature that supports a new application and another product might have a new operating or management approach that increases productivity.  Beauty is in the eye of the beholder and enterprise customers dig a lot deeper than what appears in competitors’ spec sheets. In some respects, it’s like comparing real estate properties where location and design trump square footage.

One of the traps people fall into when comparing the value of cloud services to legacy infrastructure technologies is limiting their analysis to a direct cost per capacity analysis. This article in Information Week did that in a  painstaking way where the author, Art Wittman, made a commendable effort to make a level cost comparison, but he left out the location and design elements.  He concludes that IaaS services are not worthwhile because the costs per capacity are not following the same cost curve as legacy components and systems.  There is certainly some validity to his approach – if the capacity cost of disk drives has dropped an order of magnitude in four years, why should the cost of Amazon’s S3 service be approximately 39% higher?

Conceding that productivity gains can be realized from cloud services, he limits their value to application services and summarily rejects that they could apply to IaaS. After all the work he had done to make a storage capacity cost comparison, he refused to factor in the benefits of using a service.  Given that omission, Mr. Wittman concludes there is no way for an IaaS business model to succeed.

I agree with Mr. Wittman in one respect, if a service can’t be differentiated from on-site hardware, then it will fail.  But that is not the case with enterprise  cloud storage and it is especially not true with cloud storage that is integrated with local enterprise storage. Here’s why:

Storage is an infrastructure element, but it has specialized applications, such as backup and archiving that require significant expense to manage media (tapes). Moving tapes on and off-site for disaster recovery purposes is time-consuming and error-prone. While the errors are usually not damaging, they can result in lost data or make it impossible to recover versions of files that the business might need. The cost of lost data is one of those things that is very difficult to measure, but it can be very expensive if it involves data needed for legal or compliance purposes.  Using cloud storage as virtual tape media for backup kills two birds with one stone by eliminating physical tapes and the need for off-site tape rotations. It still takes time to complete the backup job and move data to the cloud, but many hours a month in media management can be recaptured as well as tape-related costs.

There are even greater advantages available with backup if it can be integrated from primary storage all the way to the cloud, as it is with StorSimple’s cloud-integrated enterprise storage (CIES).  Using snapshot techniques on CIES storage, the amount of backup data generated is kept to a minimum, which means the amount of storage consumed from the storage cloud service provider is far less than if a customer used the cloud for virtual tape backup storage. Cloud-resident data snapshots have a huge capacity advantage over backup storage where the storage of files for legal and compliance purposes are concerned and it demonstrates how the design of a cloud appliance can deliver even more value from cloud storage.

The next increase in cloud storage value comes from integrating deduplication, or dedupe technology with cloud storage.  Dedupe minimizes the amount of storage capacity consumed by data by eliminating redundant information within the data itself. Sometimes, the amount of deduped data can be quite large – as occurs with virtualized systems. StorSimple’s CIES systems automatically applies dedupe to the data stored in the cloud and squishes capacity consumption to its minimum level – which also minimizes the amount of data that is transferred to and from the cloud. With the help of a cloud-integrated enterprise storage system, the capacity of cloud storage increases in value a lot because so much less of it is consumed.

But the worth of cloud storage is not all about consuming capacity, it’s about accessing data faster than you can from legacy data archives. Data stored in the cloud with a CIES system is online and can be accessed by workers and administrators without the need to find it in a separate archive pool of storage. If you don’t work in IT, you might not know how much time that can save the IT staff, but if you do work in IT, you know this is a huge advantage that returns a lot of administrator time for other projects.

The access to data in cloud storage is probably most valuable when it occurs following a disaster.  Cloud storage provides the ultimate flexibility in recovery by being location-independent.  Backup or snapshot data stored in the cloud can be accessed from almost any location with an Internet connection to the cloud storage service provider.  Again, cloud-integrated storage has some important advantages that further increase the value of cloud storage by requiring only a small subset of the data to be downloaded before application systems can resume production work. This is much faster than downloading multiple virtual tapes and then restoring data to application servers.

I could go on – and I will in future blog posts. This one is long enough already. There are numerous ways that cloud storage is worth more than it’s raw capacity.  Some of this worth comes from its role in disaster recovery but a lot of it comes from how it is used as part of an integrated storage stack that incorporates primary, backup, archive and cloud storage.