On the Path to Sustainable Enterprise Storage

drought tolerant yardWith the drought here in California, we replaced our grass lawn with drought tolerant plants. It requires a lot less water, but the best thing about it is that we have a new patio to hang out on and watch the world go past.

A few days ago I was smugly sipping a Quivera Sauvignon Blanc on our new patio, looking at our drought-tolerant garden and thinking about posting on “green enterprise storage” – a concept that always seemed like a stretch, considering the massive energy drain and short life cycles of these machines (~ 3.5 years). Even the comparatively small enterprise storage systems I used to work with at equallogic back power cablesEqualLogic and StorSimple used a lot of power and generated a lot of heat. The much larger 3PAR systems I worked with drew massive amounts of power and were anything but green, even though they operated at higher utilization rates than most of their big gladiator-class competitors.

Enterprise storage is designed for hard use and it tends to get exhausted in short order. Organizations depend heavily on their enterprise storage systems and have little motivation to take chances by extending their longevity. No amount of greenwashing is going to change that dynamic.

While green enterprise storage is not exactly credible,sustainable computing word cloud the concept of sustainability in enterprise storage is. Sustainability is not an assessment of a product’s environmental impact as much as it is a long-term approach to reducing resource consumption and undesirable waste. In other words, the enterprise arrays used by a business might not necessarily be “green”, but the IT organization can have a goal of becoming more sustainable over time. When you see something that is so obviously wasteful – as enterprise storage is – it isn’t difficult to believe that progress can be made.

Flash storage is the game changer for sustainability

new flash trajectoryThe undisputable champion of sustainable enterprise storage is flash. Flash has changed the trajectory of the entire industry and everybody involved is developing products and strategies to exploit it. While most of the focus has been on performance, there are clear cost and sustainability wins too.  For starters, flash SSDs consume less energy and run cooler than disk drives, things that are key to lowering TCO and improving sustainability. Flash SSDs also wear out more slowly than disk drives because they do not have moving, mechanical parts. Even in hybrid arrays combining flash SSDs and disk drives, the disk drives are accessed far less frequently, reducing the heat they generate. Arrays that require far less energy for cooling over their lifetime improve sustainability.

Arrays that last longer and necessitate less frequent swap-outs do too. It is not evident across all vendors yet, but the warranties of flash-based arrays appear to be longer than non-flash arrays and the expected life cycles of flash-based arrays will likely prove to be longer as well, adding years to technology refresh cycles. When this trend is realized throughout the industry it will be another flash-based boost for sustainability.

It’s not clear to me what the manufacturing, distribution and end-of-life waste elements are for flash SSDs and disk drives. Perhaps that is something that will come to light in the future to help guide further discussions and comparisons of disk and flash storage.

The synergy of virtualization, consolidation & data reduction

Considering that data is stored under the direction of the operating system and hypervisor, there is a clear synergy between servers and storage, including the potential to improve sustainability. It follows that improvements generated by servers can be compounded by improvements generated by storage. For example, combining server virtualization and consolidation with data reduction in storage creates a very efficient stack, as illustrated below:

virtualization consolidation and data reductionMany VMs are consolidated onto a much smaller number of physical machines, which use a single enterprise storage array to store data (represented by the steamroller). The array employs data reduction to eliminate statistical redundancies and duplicate copies of data that exist across all the virtual machines.

flash IOPS comparisonIn the same way that virtualization and consolidation work hand-in-hand to improve enterprise storage sustainability, data reduction and flash-based storage are similarly aligned. Prior to the availability of flash-based storage, virtualization experts warned that applications could become starved for IOPs if too many VMs were accessing data on a single storage array. In other words, the scarcity of IOPs was limiting VM consolidation ratios – and further improvements in sustainability. Fortunately, flash-based arrays provide an abundance of IOPs, significantly expanding throughput and multiplying VM consolidation ratios several times over. It is still possible to oversubscribe a storage array with virtual machines, but the point is that flash-based arrays can support many more VMs than non-flash arrays. Increased VM density equates to fewer arrays purchased, less resources consumed and less waste at the end-of-life. When you factor in reduced energy consumption and longer array life-cycles, sustainability is increased even more.

increasing capacity returns of dedupeThe sustainability benefits of deduplication improve even further when similar VMs are aggregated together on a single array. For example, using a single array for a large number of Windows Server VMs creates an environment with a great deal of data commonality, where each additional VM added to the array has increased overlapping data that is deduplicated thereby consuming fewer array resources than the VMs that preceded it.

Small improvements matter

Enterprise storage arrays need to provide high level of performance and availability, which means they will never become sustainable in the way that organic farming is. That said, there are well-defined and achievable ways to improve the sustainability of enterprise storage using flash-based and data reduction technologies. There are a wide range of products and prices, from server software solutions, to hybrid arrays and all-flash arrays. The good news is that improving enterprise storage sustainability is easily done without restructuring budgets by replacing end-of-life non-flash arrays with more sustainable flash-based arrays that cost approximately the same, or even less than the arrays that they are replacing – and have lower operating costs. As they say, YMMV and there are many options and ways to proceed.

Disclaimer: The company I work for, Tegile Systems, designs, manufactures and sells both hybrid and all-flash enterprise storage arrays.

The cloud wants your junk data

What do you think about when you think about cloud?   A lot of people think of shiny, new technology made of all new APIs and hypervisors and mobile devices and cutting edge code and things that only the next generation will understand. And for a lot of cloud customers, that’s reality. New, new, new.

What you probably didn’t know, however, is that the storage part of the cloud service provider businesses aren’t hung up on new. In fact, they are ecstatic about old. Old junk data that you would rather forget about, get out of your life and out of your data center. Data that you know you shouldn’t just delete because an attorney somewhere will ask for it. But data that’s taking up expensive tier 1 storage that is the digital equivalent of engine sludge.

Cloud storage services want it – even if you end up deleting it later. It doesn’t matter to them.  You might be thinking they just want to mine your data.  Nope. They are perfectly fine storing encrypted data that they will never be able to read. To them, it’s all the same flavor of money at whatever the going rate is.  They don’t care if the data was a lot bigger before it was deduped or compressed or whatever you have done to it to reduce the cost. Why should they care if you send them 100 GB of data that was originally 1 TB. They don’t.

It’s good business for them – they’ll even replicate it numerous times to prevent data loss.  You might be thinking “but it’s garbage data, I’d never replicate it”.  True, but if it’s garbage, data then why do you have so many backup copies of it on tape and possibly in other locations?  Why are you managing garbage over and over again?

It’s a double win. They want it and you don’t. All you need is the equivalent of a pump to move it from your expensive tier 1 storage to their data storage services. There are a number of ways this can be done, including using products from StorSimple, the company I work for. A StorSimple system ranks data based on usage, compacts it, tags it (in metadata), encrypts it and migrates it to a storage tier in the cloud where it can be downloaded or deleted later if that’s what you decide to do with it. How much money do you think your company is wasting taking care of junk?

Some gigabytes are worth more than others

Getting clarity on the cost and relative worth of enterprise technology has always been a challenge because of the complex environments and diverse requirements involved. For every good question about which product is better, there is the almost universal answer – “it depends”.  One product might have more capacity than it’s competitors, while another might have a unique feature that supports a new application and another product might have a new operating or management approach that increases productivity.  Beauty is in the eye of the beholder and enterprise customers dig a lot deeper than what appears in competitors’ spec sheets. In some respects, it’s like comparing real estate properties where location and design trump square footage.

One of the traps people fall into when comparing the value of cloud services to legacy infrastructure technologies is limiting their analysis to a direct cost per capacity analysis. This article in Information Week did that in a  painstaking way where the author, Art Wittman, made a commendable effort to make a level cost comparison, but he left out the location and design elements.  He concludes that IaaS services are not worthwhile because the costs per capacity are not following the same cost curve as legacy components and systems.  There is certainly some validity to his approach – if the capacity cost of disk drives has dropped an order of magnitude in four years, why should the cost of Amazon’s S3 service be approximately 39% higher?

Conceding that productivity gains can be realized from cloud services, he limits their value to application services and summarily rejects that they could apply to IaaS. After all the work he had done to make a storage capacity cost comparison, he refused to factor in the benefits of using a service.  Given that omission, Mr. Wittman concludes there is no way for an IaaS business model to succeed.

I agree with Mr. Wittman in one respect, if a service can’t be differentiated from on-site hardware, then it will fail.  But that is not the case with enterprise  cloud storage and it is especially not true with cloud storage that is integrated with local enterprise storage. Here’s why:

Storage is an infrastructure element, but it has specialized applications, such as backup and archiving that require significant expense to manage media (tapes). Moving tapes on and off-site for disaster recovery purposes is time-consuming and error-prone. While the errors are usually not damaging, they can result in lost data or make it impossible to recover versions of files that the business might need. The cost of lost data is one of those things that is very difficult to measure, but it can be very expensive if it involves data needed for legal or compliance purposes.  Using cloud storage as virtual tape media for backup kills two birds with one stone by eliminating physical tapes and the need for off-site tape rotations. It still takes time to complete the backup job and move data to the cloud, but many hours a month in media management can be recaptured as well as tape-related costs.

There are even greater advantages available with backup if it can be integrated from primary storage all the way to the cloud, as it is with StorSimple’s cloud-integrated enterprise storage (CIES).  Using snapshot techniques on CIES storage, the amount of backup data generated is kept to a minimum, which means the amount of storage consumed from the storage cloud service provider is far less than if a customer used the cloud for virtual tape backup storage. Cloud-resident data snapshots have a huge capacity advantage over backup storage where the storage of files for legal and compliance purposes are concerned and it demonstrates how the design of a cloud appliance can deliver even more value from cloud storage.

The next increase in cloud storage value comes from integrating deduplication, or dedupe technology with cloud storage.  Dedupe minimizes the amount of storage capacity consumed by data by eliminating redundant information within the data itself. Sometimes, the amount of deduped data can be quite large – as occurs with virtualized systems. StorSimple’s CIES systems automatically applies dedupe to the data stored in the cloud and squishes capacity consumption to its minimum level – which also minimizes the amount of data that is transferred to and from the cloud. With the help of a cloud-integrated enterprise storage system, the capacity of cloud storage increases in value a lot because so much less of it is consumed.

But the worth of cloud storage is not all about consuming capacity, it’s about accessing data faster than you can from legacy data archives. Data stored in the cloud with a CIES system is online and can be accessed by workers and administrators without the need to find it in a separate archive pool of storage. If you don’t work in IT, you might not know how much time that can save the IT staff, but if you do work in IT, you know this is a huge advantage that returns a lot of administrator time for other projects.

The access to data in cloud storage is probably most valuable when it occurs following a disaster.  Cloud storage provides the ultimate flexibility in recovery by being location-independent.  Backup or snapshot data stored in the cloud can be accessed from almost any location with an Internet connection to the cloud storage service provider.  Again, cloud-integrated storage has some important advantages that further increase the value of cloud storage by requiring only a small subset of the data to be downloaded before application systems can resume production work. This is much faster than downloading multiple virtual tapes and then restoring data to application servers.

I could go on – and I will in future blog posts. This one is long enough already. There are numerous ways that cloud storage is worth more than it’s raw capacity.  Some of this worth comes from its role in disaster recovery but a lot of it comes from how it is used as part of an integrated storage stack that incorporates primary, backup, archive and cloud storage.