kentoh - Fotolia
Western Digital adds StorReduce deduplication to object storage
US hardware giant Western Digital links up with StorReduce to provide data deduplication for object storage that combines WD ActiveScale scale-out appliances with the startup’s dedupe software
Western Digital has linked up with StorReduce in a move that will allow it to bundle the startup’s inline data deduplication software into its object storage appliances.
Key use cases targeted by the hardware maker will be backup and data protection. By offering StorReduce’s deduplication software with its storage arrays, Western Digital aims to offer an attractive alternative to backup appliances from those sold by Dell EMC, HPE and Quantum.
According to Stefan Vervaet, director for strategic alliances and development in the storage systems division of Western Digital, a backup product that couples StorReduce and ActiveScale will give average savings of 45% compared with traditional backup appliances.
It is easy to see why StorReduce’s technology is attractive – the software can be deployed on a physical server or as a virtual machine (VM) in an enterprise datacentre or in a public cloud.
Each appliance acts as a data deduplication path between servers and the object storage target. The appliances are stateless, which permits deployment in clusters, up to a maximum of 31.
Communications between server and appliances is done through the S3 protocol, which has emerged as a de facto standard for accessing back-end storage, although Microsoft Azure and Google Cloud Platform are also supported.
Data ingested by the appliances is first processed using an inline deduplication algorithm, then compressed to obtain the highest possible rate of data reduction. According to StorReduce this adds a maximum of 50ms of latency between servers and storage.
To function, each appliance needs a fast flash layer to store the deduplication index and metadata. When several appliances are connected in a cluster, this information is distributed across the cluster to protect against node failure.
A complete log of transactions is also sent to object storage as they are written to allow reconstruction of the index in case an outage affects all nodes. Because the process of index reconstruction takes time, it is also snapshotted periodically to accelerate the rebuilding process. This allows rebuilding to be carried out from only the most recent snapshot.
Within a cluster, ingestion and data access performance increases with the number of nodes. A cluster of StorReduce appliances forms a single deduplicated global namespace and can scale to several hundred PB of data, with each appliance capable of a maximum of 80PB of deduplicated data.
According to StorReduce, each appliance can provide deduplication throughput at around 2GBps (approximately 7.2TB per hour) during ingestion and rehydration of data. That makes 60GBps (216 TB per hour) for a cluster. For these rates of processing to be supported, you will need storage and networking capacity to suit.
Read more about object storage
- We recap the key attributes of file and block storage access and the pros and cons of object storage, a method that offers key benefits but also drawbacks compared with SAN and NAS.
- SNIA Europe director says traditional SAN and NAS are not built for the stateless world of web-based apps and workloads, with object storage set to become dominant.
To compare, the best performing hardware from Dell EMC’s Data Domain family can manage a maximum of 50PBps of backup data, with claimed ingestion rates of 68TBps when deduplication work is carried out at the source with DDBoost, and 31TBps without DDBoost.
According to Western Digital, the ActiveScale/StorReduce product is certified by a number of backup software providers, including Veritas NetBackup, Backup Exec, Commvault Simpana, Veeam and EMC Networker.
Western Digital points to several of its customers that already use StorReduce with its ActiveScale hardware for nearline data or as back-end storage for data in Hadoop clusters.
Massive storage of deduplicated logs on ActiveScale storage, for example, costs much less than on traditional storage and considerably reduces the physical space occupied by storage, with no major impact on performance.
Elsewhere, Western Digital has contributed several enhancements to the S3A client to the Apache Hadoop community that allow a Hadoop cluster to access stored data on an S3-based object storage system. The firm also works with Microsoft to give the same access to a Hadoop cluster in Windows.
More generally, said Vervaet, deduplicated object storage is potentially interesting for customers who have hitherto used tape or disk for big data workloads.
The use of StoreReduce on an ActiveScale appliance adds a software cost of around 7 cents per gigabyte, which would decrease at greater volume. Minimum storage capacity on an ActiveScale system is 480TB.