NetApp ONTAP Components, Principles and Features

Note: This article follows up on the older NetApp AFF8040 - Configuration, Principles, and Basic Procedures, which focuses more on basic operations and is based on the then-current version of ONTAP 9.1. I initially tried to update it a bit, but eventually decided to split it into separate articles. When I was setting up a new NetApp AFF A250 array, I had to study a number of basic ONTAP principles. This article is dedicated to that.

Note: The article is based on version ONTAP 9.9.1, which was current at the time of writing. It brings a number of improvements over older versions (like 9.8). I've read about the upcoming version 9.10, which will bring more interesting features.

NetApp ONTAP introduction

Most NetApp storage arrays use the ONTAP operating system. We can manage it using the ONTAP CLI command line interface or the ONTAP System Manager web graphical interface.

Documentation

NetApp storage array

A storage array is formed as a Cluster, which consists of two nodes (Cluster Nodes) that can be further grouped. These are configured as a high availability pair (High Availability (HA) Pair), which provides fault tolerance and the possibility of maintenance without downtime.

A node is represented by a controller (Storage Controller), its storage, network connectivity, and the running ONTAP system. The controller includes a Service Processor (SP) for remote management and monitoring. It is assigned an IP address. A cluster VIP address is used for cluster management, and each node has its own management IP.

Data access

Data on the array can be accessed in various ways

block access - clients (Hosts - servers) are connected to a SAN (Storage Area Network) network and use protocols like Internet Small Computer System Interface (iSCSI) or Fibre Channel over Ethernet (FCoE) in Ethernet or Fibre Channel Protocol (FCP) or Non-Volatile Memory Express over Fibre Channel (NVMe/FC) in FC to access LUNs
file access - NAS (Network Attached Storage) access using protocols like Common Internet File Services / Server Message Block (CIFS/SMB) or Network File System (NFS) to access files
object access (Object Storage) - data is stored as objects organized into buckets, using the Amazon S3 interface for access

Basic terms of SAN networks

More information can be found in the article Storage technologies and SAN networks or connecting servers to a disk array.

Initiator - a client that connects to a LUN, negotiates connection with the Target
Target - the target address of the storage array through which the Initiator connects to the LUN
Initiator Group (iGroup) - determines which host can access a specific LUN on the storage, represents the client (server) on the array
IQN -iSCSI Qualified Name - unique iSCSI identifier of the Initiator or Target, example iqn.1991-05.com.microsoft:server.company.local

Virtualization - Storage virtualization - SVM

Storage - Storage VMs

ONTAP allows dividing the array into virtual parts that act as separate arrays, and this virtual array can be assigned to a subject including separate management.

It uses Storage Virtual Machine (Storage VM - SVM), previously known as vServer. SVMs are logical entities that abstract physical resources. Even if we use the entire array ourselves and don't need to divide it, SVMs must be created to provide data to clients using a specific protocol. An SVM provides data from one or more volumes through one or more LIFs. It doesn't matter where the Volume is located in the cluster (on any aggregate), nor on which port the LIF is hosted. We can move Volumes and LIFs during operation without interrupting the service.

Each SVM has its own namespace, which is a directory structure. When we create a data SVM, a root volume is created where the root of the Namespace is stored. Other volumes in the SVM are interconnected through junctions.

Besides data SVMs, there are also special system types that are created automatically (GUI doesn't display them).

Admin SVM - cluster management server, represents the cluster as a single manageable unit
Node SVM - represents an individual cluster node
System SVM (advanced) - used for cluster communication in IPspace
Data SVM - provides data, we need to create and add volumes to enable data access

Network Architecture

ONTAP uses three networks:

cluster communication (cluster interconnect)
management (management network for cluster administration)
data (data network)

Physical and logical ports

Network - Ethernet Ports

NICs (network interface cards) provide physical ports for Ethernet connections. HBAs (host bus adapters) provide physical ports for Fibre Channel connections. We can create logical ports for physical ports. Link Aggregation Group (Interface Group) is a grouping of multiple physical ports into one logical port for greater availability (PortChannel). VLAN divides communication on a port into logical segments.

LIF - Logical Interface

Network - Overview

An important term is Logical Interface (LIF), referred to as Network Interface in the GUI. We assign an IP address for Ethernet or a World Wide Port Name (WWPN) for Fibre Channel to the logical interface. This allows for greater flexibility than if the address was set directly on the physical port. During Failover or maintenance, LIF can migrate without interruption to another physical port, which can be on a different node.

LIF is assigned to a physical port, VLAN, or Interface Group, and multiple LIFs can be assigned to the same port. They are reserved for one specific SVM. LIF runs on the assigned node (Home Node) and port (Home Port), during migration it can move to another port within the Failover Group. For LIF, we set a Service Policy that determines its behavior. Failover Policy defines failover options (e.g., only ports on the same node or on a different one). Failover Group determines possible ports for relocation.

ONTAP automatically creates a Failover Group for each Broadcast Domain. This includes ports in the same L2 network, which are used for Failover. If needed, it's possible to create them manually.

Path Failover works differently for NAS and SAN. NAS LIF automatically migrates to another port when a link fails. SAN LIF doesn't migrate (we can move it manually) because it relies on Multipath technology on the client, which redirects traffic to another port.

LIF Types

Node Management LIF - each node has its own LIF for management, doesn't leave the node
Cluster Management LIF - for managing the entire cluster, moves between nodes
Cluster LIF - for traffic between nodes in the cluster, assigned to physical cluster interconnect ports
Data LIF - provides client access using SAN and NAS protocols
Intercluster LIF - for SnapMirror and SnapVault replications

Note: A special exception is the IP address for the Service Processor. It's not set as a LIF, but directly on the device. Cluster - Overview - click on the three dots next to the node and select Edit Service Processor.

IPspaces - address spaces

Network - Overview

IPspaces allow dividing the storage array for multiple organizations that can use the same IP addresses. For one organization, it's recommended to use only one IPspace. Storage VM (SVM) is assigned to one IPspace (can't be moved) and maintains its own routing table (similar to VRF). There's no cross-SVM or cross-IPspace routing. A system SVM is created for each IPspace. In other words, IPspaces allow different SVMs on the same cluster to use the same (overlapping) IP addresses.

An IPspace Cluster is automatically created, where cluster ports are assigned and node communication takes place (internal private cluster network). And Default for everything else, including array and node management. For most cases in practice, we can manage with this.

Broadcast Domains

Network - Overview

Broadcast Domains are used to group network ports that belong to the same L2 network (can communicate with each other on L2). The ports in the group are used by SVM for data traffic or management (or cluster). A Broadcast Domain is assigned to a specific IPspace. A port in a Broadcast Domain (can only be assigned to one) is used by LIF. Cluster and Default Broadcast Domains are created automatically, with additional Default-1, etc. created as needed.

Broadcast Domains work in conjunction with Failover Groups and Subnets (enables, controls, and secures LIF Failover behavior). ONTAP automatically creates a Failover Group for each Broadcast Domain. This ensures that in case of LIF migration to another physical port, it's a port from the same network and clients still have connectivity.

Subnets

Subnets can be optionally used to allocate a block of IP addresses and automatically assign them to LIFs. A Subnet is created within a Broadcast Domain, specifying the gateway address, subnet, and range of available addresses. It can only be created in CLI. When creating a LIF, we don't need to manually specify; the next available address from the Subnet will be automatically assigned.

Configuring a new array

If we have a new array without configuration, we can have many things set automatically using ONTAP System Manager. After the initial initialization, where we set IP addresses, admin password, DNS, and NTP, we have various wizards on the Dashboard. We can use Prepare Storage, which will optimally set up disks, create RAID Groups and aggregates. There's also a wizard for Configure Protocols, which is essentially creating a Storage VM. SVM also creates LIF, but often we need to prepare them manually in advance. In certain situations, new Broadcast Domains are automatically created.

ONTAP System Manager - Overview nové pole

Before we start creating SVM, we need to prepare the ports. For example, if we want to aggregate (combine) some ports into a PortChannel, we create a Link Aggregation Group. We choose ports, mode (preferably LACP, can't be changed later in GUI, must be deleted and recreated if needed) and load balancing method. For iSCSI ports, we probably want to set MTU 9000. We can see the set value for individual ports, but we can't change it here. Settings are done on Broadcast Domains. It's probably better to prepare new Broadcast Domains (if it's for new networks).

When creating SVM, we enter addresses (create) Network Interface, i.e., LIF. We enter IP address and mask, optionally gateway. If the subnet already exists, the corresponding Broadcast Domain is assigned. Otherwise, the default is used. It's important to choose the correct Broadcast Domains where we have free ports (Home Port). LIF will be created (assigned) to them. If a port isn't assigned to any Broadcast Domain, it won't be used.

Storage Architecture

Logické komponenty diskového pole NetApp

Aggregate

Storage - Tiers

The architecture starts with physical disks, which we group into aggregates. This is a container for disks managed by one cluster node. Aggregates can be used to isolate tasks with different performance requirements.

An aggregate is assigned to a specific cluster node, which owns the disks inside. In case of a failure, it switches to the other node (active-passive). The aggregate can be accessed via network interfaces on both nodes, but read and write requests are processed by only one. To utilize the performance of both controllers, we need two aggregates, each assigned to a different node. More on this below in the description of ADPv2.

We can increase an aggregate on the fly by adding disks. But it can't be decreased. The only option is to move all volumes to another aggregate. Delete the original and recreate it. By moving volumes between aggregates, we can also fine-tune performance and distribute load across individual controllers.

RAID - Redundant Array of Independent Disks

ONTAP supports three types of RAID for aggregates, depending on the disks used, their type (NL-SAS, SAS, SSD), size, and number.

RAID4 can use one spare disk and a maximum of 14 disks
RAID-DP (Double Parity) can use two spare disks and a total of up to 28 disks
RAID-TEC (Triple Erasure Encoding) new type, supports three spare disks, a total of 29 disks and is designed for large SATA and SSD over 6TB

RAID Group

We set the used type of RAID for an aggregate. Inside the aggregate, there is one or more RAID Groups (default rg0). The RAID type on the aggregate determines how many parity disks are in the RAID Group (i.e., how many disk failures it protects against). For the aggregate, we can also set some disks as Spare disk. Probably just by leaving them free (not using them to create an aggregate). I manually created an aggregate in CLI on an array with 24 disks, specified using 23 disks, and when listing Spare, one shows as available.

Disks in a RAID Group must be of the same type (SAS, SATA, SSD). They should have the same size and speed. Depending on the RAID type, a RAID Group can contain a maximum number of disks. When more disks are used, there's less overhead, better performance, but longer rebuild time and a greater chance of simultaneous failure of multiple disks. Therefore, when we have many disks, we may need to create several RAID Groups in the aggregate.

Plex

What's not often described are Plexes. This is another container inside the aggregate, by default we have plex0. Inside the Plex is the RAID Group. Plex is used for mirroring (data copy) SyncMirror. If we use it, we also have a second plex1. Each Plex has its own disks (Disk Pool) and its own RAID Group, they are physically separate. Data is updated simultaneously on both Plexes, thus increasing data availability.

We can't list RAID Group or Plex separately. They can be displayed as part of the aggregate using CLI. Logically, it looks like a disk is inside a RAID Group, which is inside a Plex, which is inside an aggregate. But it's more often described that disks are inside the aggregate and the others are properties or attributes of the aggregate.

Advanced Drive Partitioning v2 (ADPv2)

NetAPP FAS/AFF systems predominantly use RAID-DP (Double Parity). Where each RAID Group has 2 parity disks and can have up to 2 Spare disks. It can consist of a maximum of 28 SSD disks. Apart from the data aggregate (Data Aggregate), we also need a Root aggregate (Root Aggregate), where configuration files and operating system logs are stored (it has the same RAID as the data aggregate). Physical disks are software-assigned to controllers (ownership).

A disk shelf typically supports 24 SSD disks. When we divide them between controllers, one has 12 disks. Where RAID-DP is created (2 parity and 1 spare disk) and 9 disks remain for data. And we should still allocate disks for OS (Root), so much less would remain. That's why NetApp came up with disk partitioning - Partitioning.

ADP v1, on each disk assigned to one controller, a small part is separated and a Root Aggregate is created in it (across all these disks). It's called Root-Data Partitioning.

ADP v2 allows even more disk utilization (less overhead). Three partitions are created on each disk, hence it's called Root-Data-Data Partitioning. For each controller (node), we have an aggregate that owns an (equally sized) part on (all) disks. This is best illustrated in the image from NetApp.

This results in various characteristics

all disks are shared and divided into parts (Shared Partitioned Disks)
we don't have one data space on the array, but two, even though the new GUI shows one total (and free) capacity, we need to look at aggregates (the term Tiers is newly used in GUI) to avoid running out of space (physical vs. logical space adds further complexity, a slightly better view is in ONTAP 9.9.1)
the Spare disk somehow belongs to the aggregate (controller), even though it's shown separately in the image
when we create a volume, it's important which aggregate (Tier) it's placed in, from ONTAP 9.8 in GUI it's not chosen, but should automatically select the most suitable, or we can choose Custom Performance Service Level and manually specify, or subsequently move the volume, 10 Common Questions about Provisioning in ONTAP System Manager 9.8
we can move volumes between aggregates during operation (but it's slow), if we move all volumes, we can delete the aggregate

Volume - FlexVol Volume

Storage - Volumes

ONTAP provides data to clients from a logical container called Flexible Volume (FlexVol). When we talk about volumes, we always mean FlexVol. Volumes are located inside an aggregate and data is stored in them. There can be multiple volumes in an aggregate. We can increase and decrease them, move them, create copies.

Limits on volumes depend on the hardware used. It might be about 2000 volumes for smaller arrays, maximum size 100 TB. If we need more space for NAS, we can use FlexGroup volumes (successor to Infinite Volumes). These can span across different aggregates and cluster nodes.

A volume is something like a partition from the array's perspective. From the server's perspective (to which we assign it), it's a disk and we can create partitions inside (today the term Volume is more commonly used even in the Windows world).

Besides data volumes (Data Volumes), there are several special volumes:

Node Root Volume - typically vol0, contains node configuration information and logs
SVM Root Volume - serves as an entry point into the namespace provided by the SVM, contains information about the Namespace Directory
System Volume - contains special metadata, such as audit logs

QTree

Storage - Qtrees

QTree can be (optionally) used to divide a volume into more manageable units. Along with them, we can use quotas to limit the volume's resource utilization. We can create a QTree inside a volume, and NAS clients see it as a directory. We can work with them roughly the same way as with a volume.

LUN - Logical Unit Number

Storage - LUNs

In a NAS environment, a volume contains a file system. In a SAN environment, it's about LUNs. A LUN (Logical Unit Number) is an identifier for a logical device (Logical Unit) addressed by the SAN protocol. LUNs are the basic storage units in a SAN configuration. They are located in a volume (or in a QTree), where there can be multiple, but it's recommended to have one LUN in one volume. If we need multiple LUNs in a volume, it's good to use QTree. We can move a LUN to another volume without downtime.

The client (Initiator) sees a LUN as a virtual disk from the array (Target). From the array's perspective, it's a file inside a volume. When we assign multiple volumes to a server, each has a different LUN number. From a SCSI perspective, a LUN is a logical (addressable) device that is part of a physical device (Target). The maximum size of a LUN is 16 TB (only in All SAN Array (ASA) configuration it's 128 TB).

If we use Snapshots, there must be enough space in the volume for both the LUN and the Snapshots. So the volume must be larger than the LUN.

WAFL - Write Anywhere File Layout

WAFL is NetApp's proprietary technology (file system) that sits above RAID and mediates read and write operations. It's optimized for writing. It supports large storage, high performance, fast error recovery, and space expansion. It writes multiple operations to disk in one sequential point. Data can be written anywhere on the disk (metadata doesn't need to be written to a specific location).

Various recommendations

we can use Thin Provisioning, which can save us a lot of space, but this leads to Over Provisioning (we allocate more space than is available), so we need to carefully monitor aggregate filling (physical space)
Volume should be larger than LUN, depending on snapshot usage, but at least by 5% or we can enable Autogrow by 10% (Resize Automatically)
we should fill the aggregate (or Tier) only up to 80% for optimal array functioning, maximum 90%, after that problems may occur
it's good to monitor (or adjust default values in CLI) the filling of LUN, Volume, Aggregate
it's good to use Active IQ Unified Manager (which we can install for free on-premises) and/or Active IQ Online
I'm used to using iSCSI, but NFS might be useful for some purposes

Storage Efficiency

ONTAP offers a range of technologies to more efficiently use the available physical space on the array (in storage). They are all built on WAFL. They are referred to as storage efficiency features. These features help achieve optimal space savings (mostly) on FlexVol Volume.

Primarily, it's about deduplication, data compression and data compaction. In newer versions of ONTAP on AFF systems (All Flash FAS), all are automatically turned on.

Thin Provisioning

FAQ: Over Provisioning aka Thin Provisioning in ONTAP

For a Thin Provisioned volume or LUN, space is not reserved in advance in the storage. Space is allocated dynamically as needed. Free space is released back to the storage when data in the volume or LUN is deleted. For a volume, we can allocate more space than is physically available in the aggregate. Similarly, for a LUN, we can allocate more space than is physically available in the volume.

The advantages are clear. When we allocate space (disks) to servers, in practice they are not filled to one hundred percent, so we don't need to have that unoccupied space. We can also initially allocate larger space, which fills up gradually, and only expand the space on the array when needed.

A Thin Provisioned volume has Space Guarantee set to None. Space is not guaranteed, it may happen that the needed space for storing data will not be available. If we allocate more space to volumes together than is physically available in the aggregate, it's called Over Provisioning. This is often desired in principle, but we need to monitor aggregate filling very carefully. If space runs out in the LUN, volume or aggregate, it switches to Offline to protect the data. A Thin Provisioned LUN has Space Reservation set to Disabled.

If we use Thin Provisioning, when data is deleted in a certain system, this information may not be passed to the array, so more space is still occupied here. We can perform recovery of deleted blocks. Information for VMware Reclaiming VMFS deleted blocks on Thin Provisioned LUNs (2057513).

In contrast, traditional space allocation is called Thick Provisioning. Allocated space is immediately reserved and space is guaranteed. Free space displayed in the volume is physically available. Over Provisioning cannot occur.

ONTAP System Manager - nastavení svazku (Thin Provisioning)

Deduplication

Deduplication reduces the amount of physical storage needed for a volume (or all volumes in an aggregate) by eliminating duplicate blocks and replacing them with references to a single shared block. Reading deduplicated data has no performance impact, writing has negligible performance reduction.

WAFL creates a catalog of block signatures during writing. During deduplication, signatures are compared to identify duplicate blocks. If a match occurs, a byte-by-byte comparison is performed. Only with a complete match is the block discarded.

We can use several deduplication methods

Volume-Level Inline Deduplication - deduplication occurs during writing within the volume
Volume-Level Background Deduplication - postprocess deduplication within the volume over data stored on disk, using auto policy runs continuously in the background
Aggregate-Level Inline Deduplication - eliminates duplicate blocks during writing across volumes within the same aggregate (Cross-Volume Inline Deduplication)
Aggregate-Level Background Deduplication - activates in the background when sufficient data change occurs, we can run manually in CLI

Compression

Compression reduces the amount of physical storage needed for a volume by combining data blocks into compression groups, each of which is stored as a single block. When reading, only the compressed groups containing the requested data are decompressed, not the entire file or LUN.

It can occur during writing, where data is compressed in memory (Inline Compression) or scheduled later over data stored on disk (Postprocess Compression).

Compaction

Compaction, data condensation (reduction), applies to small files that would normally occupy an entire 4 KB block (even if they don't fill it). Thanks to compaction, data blocks are combined and written into a single 4 KB block on disk. Inline data compaction occurs while the data is still in memory. It works at the volume level, which must be Thin Provisioned.

FlexClone

With FlexClone technology, we can create a writable copy of a volume (or file or LUN) that shares data blocks with the parent, so it takes up no space. The copy is created almost instantly and only occupies space for metadata and written changes.

Displaying (logical and physical) capacity and Storage Efficiency information

We can view certain information in the GUI (some details were added in ONTAP 9.9.1), but for more details and options we must use the CLI. In the System Manager description, we'll also discuss what the logical and physical values listed in various places mean.

CLI - displaying Storage Efficiency information

We can view which methods are enabled for individual volumes.

AFF::> volume efficiency config 
Vserver:                                      svm-iscsi
Volume:                                       Server_vol_01
Schedule:                                     -
Policy:                                       auto
Compression:                                  true
Inline Compression:                           true
Inline Dedupe:                                true
Data Compaction:                              true
Cross Volume Inline Deduplication:            true
Cross Volume Background Deduplication:        true

Another command shows us the efficiency status for volumes, including information about ongoing operations. Or details for individual volumes.

volume efficiency show
volume efficiency show -instance

We can also view the efficiency status for aggregates (cross volume deduplication).

storage aggregate efficiency cross-volume-dedupe show

CLI - displaying space savings and occupied (physical and logical) space

How much space was saved due to efficiency (individual functions) within a volume can be seen in its details. In the example, only selected items are displayed.

AFF::> volume show -vserver svm-iscsi -volume Server_vol_01

                                    Volume Size: 15TB
                                 Available Size: 5.42TB
                                Filesystem Size: 15TB
                                      Used Size: 6.21TB
                                Used Percentage: 41%

              Space Saved by Storage Efficiency: 2.94TB
         Percentage Saved by Storage Efficiency: 32%
                   Space Saved by Deduplication: 2.94TB
              Percentage Saved by Deduplication: 32%
                  Space Shared by Deduplication: 892.4GB
                     Space Saved by Compression: 0B
          Percentage Space Saved by Compression: 0%

                       Total Physical Used Size: 6.21TB
                       Physical Used Percentage: 41%
                          Over Provisioned Size: 3.37TB
                              Logical Used Size: 9.15TB
                        Logical Used Percentage: 61%
            Performance Tier Inactive User Data: 2.11TB

The ratio of efficiency within aggregates, how much space we have physically used and how much data is logically stored, is shown by a variant of the command that must be run in privileged mode. Only part of the output is shown.

AFF::> set -privilege advanced

Warning: These advanced commands are potentially dangerous; use them only when directed to do
         so by NetApp personnel.
Do you want to continue? {y|n}: y

AFF::*> storage aggregate show-efficiency -advanced

Aggregate: AFF_A250_01_NVME_SSD_1
     Node: AFF-A250-01

----- Total Storage Efficiency ----------------
    Logical    Physical                 Storage
       Used        Used        Efficiency Ratio
----------- ----------- -----------------------
    38.06TB     17.51TB                  2.17:1

-- Aggregate level Storage Efficiency ---------
(Aggregate Deduplication and Data Compaction)
    Logical    Physical                 Storage
       Used        Used        Efficiency Ratio
----------- ----------- -----------------------
    28.61TB     17.51TB                  1.63:1

-------- Volume level Storage Efficiency -----------
    Logical    Physical      Total Volume Level Data
       Used        Used   Reduction Efficiency Ratio
----------- -----------   --------------------------
    38.06TB     28.27TB                       1.35:1

---- Deduplication ---- ------ Compression ----
    Savings  Efficiency     Savings  Efficiency
                  Ratio                   Ratio
----------- ----------- ----------- -----------
     9.80TB      1.35:1          0B      1.00:1

The command also has variants that we run in normal mode. It shows only the total ratio or breakdown into individual functions, but without data size.

storage aggregate show-efficiency 
storage aggregate show-efficiency -details

CLI - aggregate occupancy

How much total space individual volumes in the aggregate occupy (including metadata and other overhead, for Thick Provisioned the Volume Guarantee is also listed).

AFF::> volume show-footprint

Vserver : svm-iscsi
Volume  : Servers_vol_01

Feature                                          Used    Used%
--------------------------------           ----------    -----
Volume Data Footprint                          6.20TB      27%
Volume Guarantee                                   0B       0%
Flexible Volume Metadata                      37.51GB       0%
Deduplication                                 39.47GB       0%
Delayed Frees                                 70.64GB       0%

Total Footprint                                6.34TB      27%

Another interesting command displays the use of space in the aggregate.

AFF::> storage aggregate show-space

Aggregate : AFF_A250_01_NVME_SSD_1

Feature                                          Used      Used%
--------------------------------           ----------     ------
Volume Footprints                             29.12TB       125%
Aggregate Metadata                                 0B         0%
Snapshot Reserve                                   0B         0%
Total Used                                    17.41TB        74%

Total Physical Used                           17.30TB        74%

Total Provisioned Space                       54.00TB       231%

Logical and physical capacity

Monitor capacity in System Manager

Due to the effect of deduplication and Thin Provisioning (and other storage efficiency features), I found the information about free and occupied space displayed by ONTAP System Manager (but also CLI) quite confusing. Moreover, the GUI changes a lot with new versions and different versions display different values in the same place.

The documentation states that system capacity is measured in two ways

physical capacity - actually occupied space, physical storage blocks occupied by the volume
logical capacity - usable volume space, the volume of data without counting efficiency (deduplication, compression), also includes Snapshots and clones

So logically used space is the size of stored data, but physically it takes up less space due to efficiency.

System Manager - Volume Capacity

Storage - Volumes

In ONTAP 9.9.1 we see a list of volumes and their capacity, showing size, used and free space. We can click on the data to display a more detailed breakdown.

In the example, we have a volume of 2 TB, assigned to a server that sees a 2 TB disk. 1.75 TB of data is stored on the server and it shows 0.25 TB of free space. ONTAP shows that logical used is 1.75 TB, which physically occupies (Data Used) only 1.07 TB. But it shows that the available space is 955 GB (as I was explained, part of the space is occupied by WAFL metadata, so it's not 993 GB).

The server can fill only 2 TB of (logical) data. But more can be stored in the volume. That's why the documentation states that the total logical capacity can be greater than the provisioned space. And the percentage use of logical space can show more than 100%.

However, the figure for how much data uses (Data Used) does not correspond to physically occupied storage blocks. Even though in Active IQ this value is called Physical Used. This is the value with the effect of Volume Efficiency, deduplication within the volume. But Cross-Volume Deduplication, deduplication within the aggregate, is also (typically) performed, which can significantly reduce the physically required capacity.

ONTAP System Manager - Volume Capacity Over-provisioning

In OnCommand System Manager, what is the volume "Over Provisioned Space"?

Another situation or value we might see for a volume is Over Provisioning. If the free physical space in the aggregate is less than the available capacity in the volume should be. Then the available space in the aggregate is shown as free space in the volume. The rest is shown in black in the graph as Over-provisioning. Since the available physical capacity is taken as free space, in practice we have more space available for data (when applying deduplication, etc.).

System Manager - Aggregate (Tier) Capacity

Storage - Tiers

In the Tiers (aggregates) view, we see how much physical space is occupied (Physical Used) and how much is free (Available). For used space (Used Space) we see how much client data is there (Client Data) and how much Snapshots occupy, so how much is logically occupied (Logical Used) in total. The data reduction ratio is calculated from this.

ONTAP System Manager - Aggregate (Tier) Capacity

If we look at the volumes that are in the given aggregate, and sum up the (physically) used capacity. Then the value is usually larger than the occupied physical space of the aggregate. This is because Cross-Volume deduplication is also applied, which can achieve good results, so the volumes occupy even less space.

We see this data in the CLI command described above

storage aggregate show-efficiency -advanced

System Manager - Cluster Capacity

Dashboard

The dashboard shows the total capacity, which can be misleading. But we see the total volume of data and the reduction ratio, which is useful. We can't rely on free space because we typically have two aggregates created and enough free space must remain in each.