With EMC’s acquisition of XtremeIO today the landscape for storage products appears to be destined to change again to include a new segment for all-flash arrays. One of the technologies that will go mainstream along with flash arrays is primary dedupe. When you have all the read performance that flash provides, there isn’t any reason not to do it. A number of smaller vendors including StorSimple, the company I work for and Pure Storage have been using dedupe already paired with flash SSDs.
Chris Evans, in his blog The Storage Architect, wrote a couple weeks ago about the potential future for primary dedupe, pointing out that Netapp’s A-SIS product has produced good results for customers since it was introduced in 2007. He then goes on to discuss the symbiotic relationship between flash SSDs and dedupe before posings the question about when dedupe will become mainstream for primary storage, saying
That brings us to the SSD-based array vendors. These companies have a vested interest in implementing de-duplication as it is one of the features they need to help make the TCO for all SSD arrays to work. Out of necessity dedupe is a required feature, forcing it to be part of the array design.
Solid state is also a perfect technology for deduplicated storage. Whether using inline or post-processing, de-duplication causes subsequent read requests to be more random in nature as the pattern of deduplicated data is unpredicable. With fixed latency, SSDs are great at delivering this type of read request that may be tricker for other array types.
Will de-duplication become a standard mainstream feature? Probably not in current array platforms but definitely for the new ones where legacy history isn’t an issue. There will come a time when those legacy platforms should be put out to pasture and by then de-duplication will be a standard feature.
In a post I wrote last week about using deduplication technology for data that is stored in the cloud I described the benefits of dedupe for reducing cloud storage transaction and storage costs. As the wheels of the industry continue to converge, it’s also inevitable that the systems that access cloud storage will also dedupe data. There isn’t any reason not to do it. The technology is available today and it’s working. Check it out.