chajamp - stock.adobe.com
Storage technology explained: File, block and object storage
We ask the key questions about file, block and object storage: how they work on-premise and in the cloud, global file systems, and file and object locking
File, block and object are fundamental to how users and applications access and modify data storage. That’s been the case for decades, and the transition to the cloud has seen that remain so – but with adaptations to the use case, performance and cost constraints of cloud storage.
In this article, we look at the fundamentals of the file system and file, block and object storage, how file, block and object have transitioned to life in the cloud, and the emerging availability of global file systems. We also drill down into how file and object locking are implemented, and the differences between network file system (NFS), server message block (SMB) and common internet file system (CIFS) in file storage.
What is a file system?
The file system is a fundamental of computing that allows data to be organised – usually in hierarchical directories – and retrieved. It is a logical system to help the operating system (OS) and user differentiate and organise information and also forms part of the physical addressing of data on storage media.
File systems specify conventions for file naming, such as filename length, which characters to use, case sensitivity, file type extension etc. A file system also keeps metadata about files, such as file size, creation date or location in the directory.
Most file systems organise files into a hierarchy, with file location described by a path within the directory structure. Directories are organised in an inverted hierarchical tree structure.
Physical media can be formatted to work with different file systems in partitions. Or, partitions can be created to help isolate files of different types from each other for performance or security reasons, such as OS files, user files and system files. Partitions are divided into blocks devoted to, for example, file content, metadata and system data.
Access by users and applications is also controlled by the file system. That can be who has access to which files and directories as well as access control so that simultaneous writes cannot occur that might result in corruption or logical issues. Files can also be encrypted against external access.
A database management system (DBMS) is a little like a file system. But, whereas a file system provides interaction with the whole file and stores files as unstructured discrete items, a DBMS allows users to interact and change elements in a database near simultaneously. The DBMS manages the database as a consistent, single, highly controlled repository of data with robust security and access controls.
Block and file access storage offer two ways to interact with the file system.
What is file storage?
File storage, or file access storage, is storage in which entire files are accessed via the file system, usually via network-attached storage (NAS). Such products come with their own file system on board, from which storage is presented to applications and users in the drive letter format.
That contrasts with block storage, as we’ll see below, and is a fundamental distinction in storage infrastructure.
File systems have numerous benefits. Among these is that most enterprise applications are written to interact with data via a file system, although that is being eroded by object storage (see below).
File storage accesses entire files, so is unstructured and suited to general file storage, as well as specialised workloads that require file access, such as in media and entertainment. In the form of scale-out NAS, it is a mainstay of large-scale repositories for analytics and high-performance computing (HPC) workloads.
What is block storage?
In block storage, storage-area network (SAN) hardware does not address entire files (although it can). Instead, block storage provides application access to the blocks of which files – in particular databases – are comprised.
This suits workloads where many users work on the same file simultaneously and from possibly the same application – email, enterprise applications such as enterprise resource planning (ERP), for example – but with locking at the sub-file level.
So, in the case of block storage, the file system through which applications talk resides higher in the stack, on host servers.
Block storage has the great benefit of high performance, and not having to deal with metadata and file system information.
What is object storage?
Object storage is the new kid on the block, relatively speaking.
Unlike file and block storage, it lacks a file system and is based on a “flat” structure with access to objects via their unique IDs. In this way, it’s similar to the domain name system (DNS) used to access web content.
So, object storage is not hierarchical, and lacks the directory system structure. That can be an advantage when datasets grow very large. Some NAS systems can become unwieldy when they get to billions of files.
Object storage also offers a richer set of metadata than traditional file systems, which makes it well-suited to data storage for analytics and artificial intelligence (AI).
Object storage accesses data in a way that looks more like file access, but it lacks the same kind of file locking. Often, for example, more than one user can access an object at the same time (think Google Docs). So, object storage is described as “eventually consistent”.
Most legacy applications are not written for object storage, but it is the storage access method of choice for the cloud era. That’s largely down to the fact that cloud object storage comprises the bulk of capacity offered by the hyperscaler cloud providers.
What is file, block and object storage in the cloud?
The cloud is the natural home of object storage, and it’s here that now de facto standards such as S3 emerged. Object storage is the bulk storage of the cloud era, and provides easy access to data that can happily exist as eventually consistent.
The big three hyperscaler cloud providers – Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform – also offer their own file and block storage services, as well as those from third-party storage suppliers.
Big-three cloud storage options include object storage such as S3 from AWS, Azure Blob and Google Cloud Storage.
File storage from the hyperscalers includes: Amazon’s Elastic File System (EFS), which is an NFS-based file system that operates on cloud and local storage; Azure Files, which uses SMB and allows concurrent file share mounting in the cloud or on-premise; and Google Cloud Filestore, which provides NAS for Google Compute Engine and Kubernetes Engines with storage offered at standard and premium levels.
Block storage from the big three comes as Amazon Elastic Block Store, which works with Amazon Elastic Compute Cloud; Azure Disk, which provides managed disks for Azure virtual machines; and Google Persistent Disk block storage, which runs up to 64TB, and offers standard persistent disks, persistent SSDs and local SSD.
All three hyperscalers also offer higher-performing file storage based on NetApp storage. Pure Storage Cloud Block Store is available on AWS.
What are global file systems?
A number of suppliers offer so-called global file systems that combine a file system distributed across public cloud and local network hardware, with all data in a single namespace. Providers include Ctera, Nasuni, Panzura, Hammerspace and Peer Software.
CTERA provides a combination of Edge, a caching filer, CTERA Drive, an agent for endpoint devices, and VDI for virtual workspaces.
Hammerspace provides customers with a single view of their metadata via its Hyperscale NAS that it says allows data to be stored and accessed efficiently.
Nasuni offers its File Data platform, built on its UniFS file system, with Edge on-premise instances for local cached access, plus management and orchestration consoles, Nasuni IQ for performance analysis, and application programming interfaces (APIs).
Panzura positions itself as a data management player and single platform for unstructured data. Its global file system is CloudFS, which creates a single, optimised dataset.
Peer provides a global file service with hybrid and multi-cloud support, and support for edge and datacentre sites. PeerGFS is software-only, with active-active sync, a global namespace using Microsoft DFSN and object storage integration.
What’s the difference between file locking and object locking?
A fundamental function of file systems is their locking mechanisms. These make sure different users and applications that work on the same file simultaneously cannot cause conflicts that result in inaccuracies and inconsistencies in the data.
Locking is strong and well-developed in file systems. However, object storage is not built around a file system, so it lacks the same kind of methods that enable locking.
File (NAS) and block (SAN) storage both rest on the file system. NAS storage accesses files directly, while block storage accesses blocks in the file system to update parts of a database, for example, which itself comprises a “file”.
Windows systems can set file locking by application and user for whole files to restrict access, shares, reads, writes and deletes, or byte-range locks for regions of files.
Unix-like file systems, including Linux, vary between the distributions, but you can modify open files in Linux, for example. Differences are to do with how Windows and Unix-like systems record file information, but they can all restrict access and changes to files.
Read more storage technology explainers
- Storage technology explained: AI and data storage. In this guide, we examine the data storage needs of artificial intelligence, the demands it places on data storage, the suitability of cloud and object storage for AI, and key AI storage products.
- Storage technology explained: Flash vs HDD. In this guide, we examine the differences between flash storage and HDD, the rise of NVMe and much denser formats such as QLC, and whether or not flash will vanquish HDD in the all-flash datacentre.
Meanwhile, object storage lacks built-in locking. It’s not that it doesn’t exist in object storage, but it’s not built into object storage in the same way as it is with file systems. Multiple users can work on the same object at once, with changes reconciled on an “eventually consistent” basis.
Some forms of locking are implemented in object storage and the cloud. These include file access protocol gateways that sit in front of object stores.
Cloud providers such as AWS provide object locking with compliance and governance modes that give differing levels of access. Retention periods can be set that keep locks in place until the set date. Microsoft Azure also has locking for its Blob objects, with the ability to make them immutable and enforce legal hold.
Object locking has achieved some prominence as a way of quarantining data against ransomware attacks.
NoSQL databases often use object storage and can take semi- and unstructured data and implement their own locking mechanisms. For example, MongoDB allows for locking in which requests are queued, while CouchDB has a form of eventual consistency.
What’s the difference between NFS, SMB and CIFS?
NFS, SMB and CIFS are all file storage protocols that give access to files on servers and storage servers (such as NAS storage) as if they were local files.
They are distinct from the file system, being protocols that operate at the application layer to facilitate communication between applications and storage, via the file system. They are application layer protocols, of the same order as HTTP, FTP, POP and SMTP, for example.
NFS, SMB and CIFS are used with NAS file access storage, not SAN block access storage.
NFS is mostly used with Linux and Unix operating systems, and was originally developed by Sun Microsystems in 1984. It reached version 4.2, with parallel file access functionality (pNFS, used in scale-out NAS), in 2016.
Although developed by a Unix supplier and often used for Unix and Linux, NFS can also be used in Windows environments.
SMB is primarily used in Windows environments, and is the basis for Microsoft’s distributed file system. IBM first developed SMB in 1983 to provide shared network access to files and printers. Microsoft picked it up later and built it into Windows NT 3.1. It has retained it in its operating systems since then.
CIFS is an implementation of SMB, first introduced in 1996. It’s mostly used with NetBIOS-based transports and was focused on small LAN file, print and application access to storage. It’s less scalable than NFS, and considered chatty, buggy and less secure than SMB.