Big players target object storage at cloud and archive
All but one of the big six storage vendors have object storage products that target public and private cloud environments and/or archiving use cases
Object storage is an increasingly important category of storage, with nearly all the big vendors jostling in the space along with numerous smaller players. With its ability to handle large amounts of unstructured data, object storage fits the bill for some of the key storage challenges facing organisations today.
So, what are the benefits of object storage and what products have the big players brought to market?
Traditional file systems are based on blocks and files, storing data in hierarchical, tree-like directory structures. As the numbers of files and users grows, so does the number of directories and with it the complexity of the tree structure.
As a result, it takes ever longer to locate a particular file. There comes a point where this hits performance significantly or the file system reaches the limit of the numbers of files, directories and hierarchy levels it can manage.
Object storage systems were designed to resolve this problem. Implementations vary and there is no single recognised definition (see the analysis of the major vendors’ offerings below) but they do share several broad attributes.
Files have no meaning in an object-based storage system. Instead, all data is broken down into objects addressed by a unique identifier, and stored in a flat address space – there are no subdirectories. Objects are retrieved using the identifiers contained in an indexed database and assembled at a higher level into meaningful data, such as files.
Read more about object storage
- Definition: object storage
- Object storage: An architecture for the petabyte era?
Objects consist of metadata, which provides contextual information about the data, and the payload or actual data. In file-based storage systems, metadata is limited to file attributes, but metadata in object storage systems can be enriched with any number of custom attributes.
As very large systems, object stores and their associated databases are often distributed across multiple geographies (often using techniques such as erasure coding), that make traditional access protocols such as CIFS and NFS unsuitable.
Instead, access is usually via a REST API over HTTP. Commands sent over HTTP to object storage are simple: “put” to create an object, “get” to read an object, “delete” to purge an object, and “list” to list objects.
Why object storage?
The consolidation of data into ever larger storage systems, a trend accelerated by the growth of cloud computing, has highlighted the limitations of traditional file storage systems.
Object storage works well as a highly scalable data store for unstructured data that is updated infrequently, so it is well suited for cloud-based file content, especially images and videos. It is not so well suited to transactional data, such as database queries, due to its slower throughput.
Web-scale companies such as Facebook and Google hit the limits of block and file storage some time ago. They now use object storage to surmount performance and capacity barriers.
They are not alone. Cloud storage providers have been among the most eager adopters of the technology, using it to improve performance and scalability at the back end, and to ease access for users at the front end through multi-tenancy features.
Vendor approaches to object storage
EMC’s Atmos cloud flagship
EMC Atmos Cloud Storage is the company’s mainstream object storage product line. It is aimed at multi-tenant environments, and offers a global namespace, distributed architecture, REST API, metering and chargeback.
Connectivity is offered via HTTP, CAS, web services and file-based access. It can be deployed as one of two hardware-based editions – the Light Edition for single locations, the Complete Edition for distributed locations – and as a virtual appliance for VMware environments.
The Atmos’s active/active architecture offers automatic replication for easier scalability, as well as versioning, compression, data deduplication and disk drive spin-down.
The hardware versions are all 40U or 42U systems. The WS2-120 permits up to 360TB from 120 3TB 7,200rpm SAS disks, the WS2-240 doubles the numbers of disks and the total capacity, and the WS2-360 trebles the number of disks for a total capacity of 1,080TB. The smaller two configurations provide space for other computing systems to co-exist in the same rack, while the WS2-360 does not.
The 40U Atmos G3 series was launched in December 2012 and includes support for Amazon’s S3 API. It is split into four editions, each with increasing levels of density and capacity. The densest, the Dense-480 node, consists of up to eight nodes, each housed in a 4U enclosure and carrying 60 disks, and eight servers interconnected via 10 Gigabit Ethernet. When populated with 4TB disks, total capacity is 1,920TB.
Scalability is achieved by adding new systems and EMC says there is no capacity limit. The company has not quoted performance figures.
EMC also offers object storage via its acquired Isilon product line (positioned as scale-out NAS), through Centera (described as a “content-addressable storage platform for data archiving”) and in its ViPR storage virtualisation/big data environment.
Dell ditches object storage
Dell’s DX object storage hardware platform was based on Caringo’s CAStor. Then the company switched to selling the product as software-only in 2013. Now, however, Dell’s website points potential buyers to its Compellent SC8000 Storage Center Controller, a traditional SAN product, which seems to indicate the company no longer sells object storage.
Hitachi Content Platform aims at cloud and archive
Hitachi Data Systems’ Hitachi Content Platform (HCP) is a distributed, object storage system aimed at public and private cloud providers to enable data sharing, synchronisation, analysis and retrieval.
The system uses redundant nodes – Hitachi CR220 servers – and provides support for Multi-tenancy with the ability to subdivide each tenancy into multiple namespaces, and for multiple protocols and scalability up to 80PB. Each server includes five disks in a RAID5 configuration.
Entry to the HCP series starts with the HCP 300 with a minimum of four and a maximum of 20 nodes connected over 1GbE and offering up to 140TB. The highest capacity of the five-strong series is provided by the HCP 500XL, which consists of up to 80 2U nodes with four 10GbE and two 4Gbps FC ports.
The system is accessible using a REST API, as well as traditional protocols such as NFS and CIFS. Uniquely, SMTP is also provided to allow email archiving.
Configurable data protection by redundancy provides for up to four copies of each piece of data, along with continuous data integrity checking. It includes data retention policies and WORM features for compliance, along with data shredding where appropriate.
Archiving features include disk spin-down and tiering, and versioning over HTTP REST only. Metadata search is integrated into the system.
Security features include layered access control for administrative, tenancy management and end user purposes. Tenants’ passwords are inaccessible to system admins.
HP’s archive appliances
Bringing together HP’s StoreAll 9320 and 9730 storage systems with its StoreAll 8800 storage node, the StoreAll 8800 series was launched in December 2012 and is aimed at organisations that need to store and access archives of unstructured as well as production data.
It uses an object store to manage a maximum 16PB of capacity with billions of objects, scaled by adding nodes and controllers, in what HP describes as “a single hyperscale, economic, ultra-dense appliance”.
Disks are arranged in multiple pairs of nodes, each node consisting of a 2U enclosure accessible over a range of protocols including HTTP, WebDav, REST API, OpenStack object storage API, NFS, CIFS and FTP. In each 2U enclosure are 36 or 70 7,200rpm SAS disks up to 4TB capacity, depending on model. Network connectivity is over 10GbE, with 1GbE ports provided for management purposes.
Features include snapshotting, replication, data deduplication, and policy-based data tiering and retention, plus continuous data integrity checking. Using technology from the firm’s acquisition of Autonomy, the system also includes automatic indexing and fast retrieval of data for analytic purposes, and aims to provide real-time access to and querying of data.
IBM’s Elastic Storage and SoftLayer
IBM offers two routes to object storage. It does not yet produce an on-premises system itself but has pledged to do so.
The first is its Elastic Storage software, launched in May 2014. The technology is based on the company's General Parallel File System (GPFS) and the Watson supercomputer. It uses Watson's cognitive technology to handle workloads generated from cloud, analytics, mobile and social media, and can be deployed on-premises or via IBM's SoftLayer cloud.
In similar fashion to EMC’s ViPR, it works as a control plane that offers block, file and object storage access with automated tiering, guided by analytics, using patterns, storage characteristics and the network to determine where to move data. It also includes automated backup and snapshots, using one copy of the data for snapshots and their replication, so reducing storage consumption and costs.
Elastic Storage sits above OpenStack Swift, so users can access and manage data across private and public clouds. Open-source OpenStack Swift is accessed through a REST API and can scale horizontally to petabytes of data through the addition of nodes, which typically equate to servers.
IBM's second route to objects is SoftLayer’s object storage-as-a-service offering. SoftLayer was acquired by IBM in 2013 for its cloud-scale technology. Charged on a GB per month basis and scalable as required, the service is aimed as those that want to store static data such as VM images, media and email archives.
SoftLayer claims every facet of its platform can be automated and controlled by a single management system with its own API. It can be managed and controlled using a number of access technologies. Developer access is via a REST API, customer access is via a web portal or mobile application.
The API provides more than 2,200 documented methods across 180 discrete services, and supports, among others, SOAP and XML-RPC interfaces, which SoftLayer says provides full customer access to all the services available to the service provider. The search service, for example, allows requests to be made to search an entire account, a particular container or a specified path, based on the URL entered at the time of search.
Higher-level management is provided through a customer web portal that enables server and storage control, performance metrics and account management. A mobile application supports ticket creation, basic server management and bandwidth monitoring.
For security, SoftLayer offers a multi-layered approach, with IPS and IDS for the network and servers, and detailed scanning and logging capabilities. It also offers security services including Citrix NetScaler’s ICSA-certified Layer 7 attack signature detection, McAfee anti-virus, endpoint protection, data encryption and malware detection, and Nessus Vulnerability Scanner.
NetApp’s archive-targeted appliances
In 2010, NetApp acquired Bycast, a developer of object-based storage software whose technology is now the basis of NetApp StorageGrid. NetApp combines the StorageGrid software as a VMware-based virtual appliance with NetApp E-Series storage systems to offer an object storage appliance.
The system provides a distributed global namespace with automated, policy-driven data lifecycle management, and is aimed at organisations that need to store and manage archived data, especially large datasets.
Hardware support is offered for NetApp's E2600 Series controllers, DE1600 and DE6600 shelves with 2TB or 3TB NL-SAS drives, and 500-plus tape drive, robotic library and autoloader products.
Features include versioning, rules-based AES-256 or SHA-256 encryption, compression, and provision for multi-tenant gateway deployments. It provides a notification feed for third-party application billing and QoS monitoring, along with an audit feed for chargeback, search integration, security diagnosis, compliance events and customised reporting.
Access is provided both via traditional protocols such as NFS and CIFS, as well as native object command sets over an HTTP API.