EN 
06.12.2025 Mikuláš WELCOME IN MY WORLD

This website is originally written in the Czech language. Most content is machine (AI) translated into English. The translation may not be exact and may contain errors.

Tento článek si můžete zobrazit v originální české verzi. You can view this article in the original Czech version.
NetApp ONTAP EMS události, notifikace, monitoring zaplnění

NetApp ONTAP EMS events, notifications, monitoring of filling

| Petr Bouška - Samuraj |
EMS (Event Management System) collects events generated by the ONTAP system. It contains a large amount of traffic information about system states. It is important to be alerted immediately to system problems so that we can react quickly. We'll take a look at the options for viewing events and setting up to send selected EMS events (Notifications) to email. We will specifically describe setting up notifications for running out of space in aggregates and volumes (capacity monitoring).
displayed: 6 552x (2 832 CZ, 3 720 EN) | Comments [0]

Note: The article is based on version ONTAP 9.9.1.

Viewing System Events (Logs)

Severity Level

  • EMERGENCY - Disruption
  • ALERT - Single point of failure
  • ERROR - Degradation
  • NOTICE - Information
  • INFORMATIONAL - Information
  • DEBUG - Debug information

Viewing Events Using the CLI

In the CLI, we have a command with various parameters to display the contents of the event log. By default, the most recent events are displayed, with values for time when the event occurred, the cluster node, the severity of the event, and the event text.

event log show

For more details, we can add the detail parameter, but instance shows even more.

event log show -detail
event log show -instance

By default, only events with severity EMERGENCY, ALERT, and ERROR are displayed. We can change this by specifying the severity.

event log show -severity DEBUG
event log show -severity <=NOTICE

We can filter by message name

event log show -message-name secd.*

Or by the entire event text (and using other parameters that are not mentioned here)

event log show -event *Aggregate*

We can select events by time, for example, the last 10 minutes or a specified interval.

event log show -time >10m
event log show -time "11/30/2021 1:00:00".."11/30/2021 22:00:00" 

Note: In practice, we usually need to combine various parameters.

According to the information in The 'event log show' command displays only 3 days or 2048 events, the command only works with the last 3 days or 2048 records. All EMS messages are counted, so it's usually just a short time period.

The article also describes various ways to work with older logs. For example, download log files. We can do this easily through the web interface Service Processor infrastructure (SPI), at the address http(s)://<cluster-mgmt-ip>/spi/ (cluster address plus /spi).

Viewing Events Using ONTAP System Manager

  • Events & Jobs - Events

On the website, we can view events, filter, and search them. However, the display is not very responsive.

I also have a peculiar behavior on one NetApp system with ONTAP 9.9.1P3, and I couldn't find out if it's a feature or a bug. Here, only events with EMERGENCY, ALERT, and ERROR severity are displayed. Whereas on an older system with ONTAP 9.8P7, all severity levels are visible (all categories are also offered in the filter).

ONTAP System Manager - Events

Setting up System Event Notifications (Sending to Email)

Note: According to the documentation, from ONTAP 9.10.1 onwards, it will be possible to configure how EMS delivers event notifications using the (GUI) System Manager. In older versions, CLI must be used.

We can send selected events directly to email, Syslog server, REST API client (WebHooks), or as an SNMP trap. The configuration is quite similar, here we'll focus on sending emails.

Configuring the SMTP Server

Setting up the SMTP mail server (not many options are offered).

event config modify -mail-server SERVER.COMPANY.COM -mail-from EMAIL@COMPANY.COM

Creating Recipients (Email Addresses)

Creating email recipients (we generally define various notification recipients), it's always a single address, so for multiple addresses, we must define several records or use a distribution group.

event notification destination create -name ADMIN1 -email RECIPIENT1@COMPANY.COM
event notification destination create -name ADMIN2 -email RECIPIENT2@COMPANY.COM

Selecting Events to Send (Filtering)

The events we're interested in and want to be notified about are selected using an event filter. It's made up of one or more rules (Rule), which are processed from top to bottom until a match is found (First Fit). At the end, there's an implicit rule that catches everything and excludes it (Exclude).

A rule can be of type include (a message matching the rule is included) or exclude (not included). In the rule, we set the event message name (message-name), severity (severity), and SNMP Trap type (snmp-trap-type). These three items are evaluated using logical AND. When there are multiple values in an item, OR is used. The asterisk (*) is a wildcard for everything (we can combine it with other characters).

We can use a predefined filter or create our own. Listing existing filters along with their rules:

event filter show

There are 3 system-defined event filters

  • important-events - all ALERT and EMERGENCY events
  • no-info-debug-events - all EMERGENCY, ALERT, ERROR, and NOTICE events (no INFO and DEBUG)
  • default-trap-events - all ALERT and EMERGENCY events and all Standard and Built-in SNMP traps

Creating a new event filter (selects all EMERGENCY, ALERT, ERROR events, plus events about aggregate or volume filling)

event filter create -filter-name important-events-2
event filter rule add -filter-name important-events-2 -type include -severity DEBUG -message-name monitor.volume.full
event filter rule add -filter-name important-events-2 -type include -severity DEBUG -message-name monitor.volumes.one.ok
event filter rule add -filter-name important-events-2 -type include -severity DEBUG -message-name monitor.volume.ok
event filter rule add -filter-name important-events-2 -type include -severity EMERGENCY,ALERT,ERROR

Excluding a specific message from being sent

event filter rule add -filter-name important-events-2 -type exclude -message-name tsse.scan.start.failed
event filter rule reorder -filter-name important-events-2 -position 4 -to-position 5

Configuring Notification Delivery

The final step is to connect the event filter and one or more recipients (destinations) by creating an Event Notification. Once created, the notification will start working.

event notification create -filter-name no-info-debug-events -destinations ADMIN1,ADMIN2

Modifying or deleting is done using the ID, which is displayed when listing.

event notification show
event notification modify -ID 3 -destinations ADMIN3
event notification modify -ID 3 -filter-name important-events-2
event notification delete -ID 1

We can also view the history of events that were sent to a specific notification destination (email).

event notification history show -destination admin1

Event Catalog

We have a command that lists the events according to a specified filter or the details of a single event.

AFF::> event catalog show -message-name *nearlyFull*
Message                          Severity         SNMP Trap Type
-------------------------------- ---------------- -----------------
fg.inodes.member.nearlyFull      ALERT            Severity-based
fg.space.member.nearlyFull       ALERT            Severity-based
monitor.volume.nearlyFull        ERROR            Built-in
3 entries were displayed.

event catalog show -message-name monitor.volume.nearlyFull

Another command summarizes information about event occurrences.

event status show -message-name *nearlyFull*

Monitoring Aggregate and Volume Filling

Nearly Full and Full Thresholds

For volumes (Volume) and aggregates (Aggregate), percentage values are defined when they are considered

  • nearly full - EMS generates an error (ERROR), default is 95%, 0 means disabled, maximum is 99%
  • full - EMS generates a message (DEBUG), default is 98%, 0 means disabled, maximum is 100%

EMS messages are generated each time the threshold is exceeded. If the fill level is increasing, it's an ERROR/DEBUG, if it's decreasing, it's an OK. If we set up notification sending for these events, it can inform us in time about depleting space in a volume or aggregate.

Aggregate Thresholds

Viewing the current settings. We can display all items for a specific aggregate or just the threshold values for all or a specific aggregate.

storage aggregate show -aggregate AFF_01_NVME_SSD_1
storage aggregate show -fields space-nearly-full-threshold-percent,space-full-threshold-percent 

We can change one or both values for a specific aggregate.

storage aggregate modify AFF_01_NVME_SSD_1 -space-nearly-full-threshold-percent 90 -space-full-threshold-percent 95

Volume Thresholds

Viewing the current settings.

volume show -fields space-nearly-full-threshold-percent,space-full-threshold-percent

Changing the values.

volume modify -volume Server_vol -vserver svm-iscsi -space-nearly-full-threshold-percen 94 -space-full-threshold-percent 97

We can also set multiple volumes at once.

volume modify -volume VMware* -space-nearly-full-threshold-percen 90 -space-full-threshold-percent 95

EMS Messages for Events

If a message is generated when the nearly full threshold is exceeded, it's the following event. It's the same whether it's a volume or aggregate.

AFF::> event catalog show -message-name monitor.volume.nearlyFull

     Message Name: monitor.volume.nearlyFull
         Severity: ERROR
      Description: This message occurs when one or more file systems are nearly full, typically indicating at least 95% full.
 This event is accompanied by global health monitoring messages for the customer. The space usage is computed based on the
 active file system size and is computed by subtracting the value of the "Snapshot Reserve" field from the value of the
 "Used" field of the "volume show-space" command.
Corrective Action: Create space by increasing the volume or aggregate sizes, or by deleting data or deleting Snapshot(R)
 copies. To increase a volume's size, use the "volume size" command. To delete a volume's Snapshot(R) copies, use the "volume
 snapshot delete" command. To increase an aggregate's size, add disks by using the "storage aggregate add-disks" command.
 Aggregate Snapshot(R) copies are deleted automatically when the aggregate is full.
   SNMP Trap Type: Built-in
    Is Deprecated: false

The sent email contains the subject and message and continues with the description and corrective action above.

Subject: AFF-01: monitor.volume.nearlyFull [ERROR]

Message: monitor.volume.nearlyFull: Aggregate AFF_01_NVME_SSD_1 is nearly full (using or reserving 75% of space and 0%
 of inodes).

If a message is generated when the full threshold is exceeded, it's the following event. Again, the same for volume and aggregate.

AFF::> event catalog show -message-name monitor.volume.full

     Message Name: monitor.volume.full
         Severity: DEBUG
      Description: This message occurs when one or more file systems are full, typically indicating at least 98% full. This
 event is accompanied by global health monitoring messages for the customer. The space usage is computed based on the active
 file system size and is computed by subtracting the value of the "Snapshot Reserve" field from the value of the "Used"
 field of the "volume show-space" command. The volume/aggregate can be over 100% full due to space used or reserved by
 metadata. A value greater than 100% might cause Snapshot(tm) copy space to become unavailable or cause the volume to become
 logically overallocated. See the "vol.log.overalloc" EMS message for more information.
Corrective Action: NONE
   SNMP Trap Type: Built-in
    Is Deprecated: false

The email contains.

Subject: AFF-02: monitor.volume.full [DEBUG]

Message: monitor.volume.full: Volume HV01lab_vol_01@app:602... is full (using or reserving 87% of space and 0% of inodes).

When returning below the threshold, a DEBUG-severity message monitor.volumes.one.ok and monitor.volume.ok are generated.

When completely full, additional messages such as wafl.vol.full (ALERT), LUN.out.of.space (EMERGENCY) are generated.

Author:

Related articles:

NetApp ONTAP

Articles that relate to NetApp All Flash FAS (AFF) and Fabric-Attached Storage (FAS) disk arrays with the ONTAP operating system.

Computer Storage

Data storage is a vast and complex issue in the computer world. Here you will find articles dedicated to Storage Area Networks (SAN), iSCSI technologies, Fiber Channel, disk arrays (Storage System, Disk Srray) and data storage and storage in general.

If you want write something about this article use comments.

Comments

There are no comments yet.

Add comment

Insert tag: strong em link

Help:
  • maximum length of comment is 2000 characters
  • HTML tags are not allowed (they will be removed), you can use only the special tags listed above the input field
  • new line (ENTER) ends paragraph and start new one
  • when you respond to a comment, put the original comment number in squar brackets at the beginning of the paragraph (line)