Computer Data Storage Types
The main classification of storage is based on the method of data storage (or data access). The newest are object storage systems, which this article focuses on. Block storage is described in an older article Storage technologies and SAN networks or connecting servers to a disk array.
Note: We are addressing network storage, so the option of directly attached disks/devices (DAS) is mentioned only for completeness and will not be further considered.
File Storage
Data is located in files, which are stored (organized) in a hierarchical directory structure. It supports sharing, access control, and file locking at the user level. Suitable for collaboration and sharing. Users can directly access the storage.
- Network Attached Storage (NAS)
- network technology Ethernet
- communication protocol TCP/IP
- data protocols NFS, SMB/CIFS
Block Storage
Data is divided into blocks of fixed size, which have no context or structure and are stored independently. Each block has a unique address, through which it is directly accessed. A file system is typically deployed on block storage. Low latency, high performance, can connect multiple servers. Storage is typically used by servers instead of local disks.
- Storage Area Network (SAN) or Directly Attached Storage (DAS)
- network technology Ethernet or Fibre Channel
- transport protocol TCP/IP with iSCSI, Fibre Channel over Ethernet (FCoE) or Fibre Channel Protocol (FCP)
- data protocol SCSI
Object Storage
Data is stored as objects, where each object contains the actual data, metadata, and a unique identifier. High and simple scalability, ideal for storing enormous amounts of unstructured data. Access through API, which can be used by users or applications.
- network technology Ethernet
- communication protocol TCP/IP
- data protocols HTTP, REST API
Separate Network for Storage Systems
Block storage uses a SAN network. The server works with storage as if they were directly attached disks. When using Fibre Channel technology, we have a specialized network infrastructure separate from the standard LAN network (which uses Ethernet). Even with iSCSI, it is recommended to use a separate network (dedicated network components) for performance and security reasons.
It is different for file storage. Users access the storage directly, so it must be available over the LAN network. Similarly, object storage is used over LAN network or even WAN network (often cloud services). The technologies used address data security and access control.
Efficiency and Performance
Different storage types are suitable for different purposes and provide different performance. Block storage offers the lowest latency and highest transfer speed. It depends on the transport protocol, communication overhead, amount of encapsulation, etc. Fibre Channel is most efficient, specially designed for connecting to storage systems. Within Ethernet, we now have very high speeds and iSCSI-optimized hardware, so it can be an equally good choice.
Object storage uses standard protocols to transfer data. Therefore, they have higher latency and lower performance. But they support modern protocols and algorithms. They are suitable for large data volumes, such as backups and multimedia. Thanks to data access via API, they allow easy integration into modern (cloud) applications.
Object Storage
Object storage is a system for storing and managing data that treats information as objects instead of traditional files or blocks. Each object contains, besides its own data (files), a unique identifier (key) and metadata describing its content and properties. Objects are organized into buckets.
There is no set format for the content of stored data. We can store various data types (such as documents, videos, backups) efficiently and securely. It is suitable for large volumes of unstructured data. The main attributes of object storage are high scalability and security, as well as maximum durability and availability with the possibility of geographical replication.
To access object storage, API is used, typically accessible via the HTTP protocol. Different vendors may have different API implementations that may not be compatible with other systems. Many providers support the de facto standard Amazon S3 REST API. We then speak of an S3 compatible object storage.
Object Metadata
Metadata serves to describe the object and its properties and can be customized to our needs. For metadata with a fixed key, we can modify the value. Additionally, we can add custom metadata by entering a key and value. We can use metadata to search and organize data.
The metadata index keeps a record of each object, its ID, and other metadata, such as access control, creation date, and size. This information is stored separately from the actual data. The index allows quick and efficient searching and retrieving objects based on their attributes. This is a big difference from file storage, where we search only by name.
Physical Form of Object Storage and Data Storage
An object storage system is often composed of multiple interconnected physical nodes. We can easily add more nodes and continue expanding capacity. In practice, manufacturers support a specific maximum number of nodes. The advantage is the ability to group devices into large storage pools that can be distributed across multiple locations. This solution enables scaling, increases resilience, performance, and data availability.
Data objects can be distributed across multiple (all) nodes. We can also use replication, and the same data can be located in multiple places (on multiple storages or in different geographical locations). In practice, providers allow setting automatic replication between regions.
Object storage can use various techniques for data storage, which are transparent to users. This can include block storage and we can use RAID (Redundant Array of Independent Disks, e.g., RAID 6). But often a modern technique called Erasure Coding is used. It is similar to RAID but more suitable for certain uses.
Erasure Coding (EC)
Erasure Coding is an advanced method of data protection and increasing loss resistance. It allows recovering lost or damaged data through mathematical algorithms. Data (k) and parity (m) blocks are used, and we speak of an EC (k+m) scheme.
It works as follows:
- data is divided into smaller parts (data blocks), for example 8
- several additional (redundant) parity blocks are calculated from the original, for example 2
- parity blocks allow recovering (reconstructing) lost blocks
- data and parity blocks are stored on different disks, nodes, or storages
Erasure Coding can work across multiple nodes or storages. We can thus protect against entire node failures. It is computationally more demanding but has better scalability and efficiency. Rebuild during disk failure can be faster. In contrast, RAID works within directly attached disks in a single physical unit. It offers lower latency and quick access.
In practice, Erasure Coding works differently if we have a system with one node or multiple nodes. For small objects, it is not worthwhile to use Erasure Coding and we can use replication. For local protection (one node), data and parity blocks are stored on different disks, for example EC (8 + 2). For network protection (multiple nodes), data and parity blocks are stored on different nodes, for example EC (2 + 1), depending on the number of nodes. Alternatively, we can combine both, so we have local protection against disk failure within a node and network protection against entire node failure.
Security and Object Storage
Due to the use of standard protocols, their security properties are utilized. Encryption of communication using TLS (communication via HTTPS), authentication mechanisms, etc.
In object storage, we access individual objects and can control their permissions. Modern authentication protocols and algorithms are used, such as OAuth 2.0 tokens and RSA access keys. In contrast to block storage, where we access the entire volume and use outdated protocols and algorithms for authentication (e.g., CHAP for iSCSI).
Commonly, data can be protected by encryption during transmission (Encrypt Data In-Flight / In-Transit) using TLS (Transport Layer Security) and certificates. And encryption of stored data (Encrypt Data At-Rest) using encryption keys is supported.
Object Storage Resources
The following components are used to organize data stored in object storage.
Object Storage Namespace
The namespace ensures the uniqueness of bucket names within the system. It is the top-level container for all buckets and objects. It is assigned to a specific region.
Usage varies by provider. Some services (like Oracle Cloud Object Storage) create a Namespace for each customer, then bucket names must be unique within the customer and region. Other services (like Amazon S3) essentially do not use a Namespace, and bucket names must be unique across all accounts in all regions within a partition (a grouping of regions).
Object Storage Bucket
A bucket is the basic logical container (organizational unit) for storing objects. Each bucket has a unique name and is assigned to a specific region. Access to objects in the bucket can be controlled by various mechanisms, such as Bucket Policies (bucket-level access rules), Access Control Lists (ACL, object permissions), etc. Many providers (Amazon, Google) automatically encrypt all stored data.
Example of a bucket address (URL) in Oracle Cloud Object Storage
https://object-storage-namespace.compat.objectstorage.region.oraclecloud.com/bucket
Objects
Objects can have various sizes or formats. We can store videos, logs, backups, application data, or any other type of structured or unstructured data. Objects are stored in a flat data environment (without hierarchy) and can be accessed in multiple ways.
Object addressing is done via their identifier - the key (or key name). We can use name prefixes and a separator to emulate the concept of folders (for example, using an object key of photos/sample.jpg
). To access an object (resource), we use a URI (Uniform Resource Identifier), which might look like this:
https://bucket-name.s3.amazonaws.com/object-key
Individual objects may have a certain maximum size, which depends on the provider, but often it is a limit of 5 TB. Additionally, the size of a single upload is often limited (for example, 5 GB), so larger objects must be divided into parts using multipart upload. Very large data can be divided into multiple objects.
Object Storage Functions
Object storage offers various extensible properties that can be configured. These may include:
- Logging – monitors (logs) operations in the storage
- Lifecycle Management - automates deletion or moving of objects (or object versions) to cheaper storage classes
- S3 Versioning - retains multiple versions of an object
- S3 Object Lock – prevents permanent deletion of objects
S3 Versioning
The versioning feature preserves multiple versions of an object in the same bucket. It allows restoring accidentally deleted or overwritten objects. A bucket with versioning enabled maintains one current version of an object and zero or more non-current versions of the object. This feature consumes more space and increases costs.
S3 Object Lock
Object Lock blocks permanent deletion of an object during a defined retention period. S3 Object Lock uses S3 Versioning and together prevents permanent deletion or overwriting of locked object versions (WORM principle). It serves to ensure Immutability as protection against Ransomware. It can be set on a bucket or individual objects.
Locking Modes
- Governance Mode – a user with special permissions has the right to modify or delete
- Compliance Mode – no one (not even root) can modify or delete
Locking Period
- Retention Period – we set a fixed time during which the object is locked
- Legal Hold – object versions are protected indefinitely until Legal Hold is removed
Object Storage Access (API)
With object storage, communication is primarily done through API (Application Programming Interface), which allows easy access to data regardless of its physical location. Users and applications can store, retrieve, and manage data objects using the API.
Most object storage systems support some standardized APIs. For example, interfaces like Amazon S3 API, OpenStack Swift API, or CDMI (Cloud Data Management Interface). Through API, developers can easily integrate object storage into applications regardless of technology or vendor.
Amazon S3 (Amazon Simple Storage Service)
Probably the most widespread object storage is the cloud service Amazon Simple Storage Service (Amazon S3) from Amazon Web Services (AWS). Amazon created the S3 API based on REST API. It defines specific methods for working with objects, request structure, headers, and responses. Additionally, it supports object versioning management, access policy settings, multi-region support, and data replication.
To use the API for retrieving objects, we can even use just a web browser if objects are anonymously readable. API calls can be made from some code or we can use AWS SDK or AWS CLI. Requests to Amazon S3 can be authenticated or anonymous. For authenticated access, we need valid credentials and access permissions to specific resources.
S3 Compatible Storage
Due to the expansion of Amazon S3, its API has become the de facto standard for Object Storage API. Therefore, many storage and software vendors implement API compatible with Amazon S3. They are referred to as S3 Compatible Storage or S3 Object Storage. This provides a unified interface for storage operations and compatibility between different systems. Existing S3 tools and libraries can be utilized, which is an extensive ecosystem.
REST API or RESTful API
REST API (REpresentational State Transfer Application Programming Interface) is a design style (architecture) of web services that allows communication between client and server (most often) via the HTTP protocol. It allows an application or service (client) to access a resource within another application or service (server). REST is simple, flexible, and widely used.
Communication is done through HTTP requests and uses common HTTP methods GET, POST, PUT, DELETE. For transferring information to the client, any format can be used, but JSON is often utilized. For REST API calls, headers and request parameters are also important, which can contain metadata, authorization, URI, etc. Each object or data is represented as a resource and identified using a URI (Uniform Resource Identifier, e.g., https://example.com/resource/123
).
REST provides a uniform interface, is stateless, supports a layered architecture, and the possibility of caching.
Examples of Object Storages
Cloud services
- Amazon S3 (S3)
- Google Cloud Storage (S3 compatible)
- Microsoft Azure Blob Storage
- IBM Cloud Object Storage (S3 compatible)
- Wasabi Cloud Storage (S3 compatible)
- Cloudflare R2 (S3 compatible)
- MinIO AIStor (S3 compatible)
Local storage (On-Premises solution)
- ObjectFirst OOTBI (S3 compatible)
- Scality ARTESCA (S3 compatible)
- Pure Storage FlashBlade (S3 compatible)
- NetApp StorageGRID (S3 compatible)
- HPE Alletra Storage Servers + Scality (S3 compatible)
- Cloudian HyperStore (S3 compatible)
There are no comments yet.