EN 
30.11.2025 Ondřej WELCOME IN MY WORLD

This website is originally written in the Czech language. Most content is machine (AI) translated into English. The translation may not be exact and may contain errors.

Tento článek si můžete zobrazit v originální české verzi. You can view this article in the original Czech version.
NetApp ONTAP chyby na síťových portech

NetApp ONTAP errors on network ports

| Petr Bouška - Samuraj |
The NetApp array can log severity Alert events to tell us that hardware errors have been observed on a particular port on a particular node. There is not much detail here. The number of errors is said to be high, at least 1 in 1000 packets, but the real value is missing. There is also no specification of the error, it could be CRC, alignment, length or dropped frames errors. We will list how to find out more information and the possible causes of the problems.
displayed: 5 636x (2 942 CZ, 2 694 EN) | Comments [0]

Error Description

We can view the error in the Event Log in the ONTAP System Manager. If we have it set up, it will also be sent to us by email. The content looks like this.

Node: AFF-01
Time: Thu, Oct 21 17:36:40 2021 +0200
Severity: ALERT

Message: vifmgr.cluscheck.hwerrors: Port e2b on node AFF-01 is reporting a high number (at least 1 per 1000 packets) of
 observed hardware errors (CRC, length, alignment, dropped).

Description: This message occurs when a network device reports a high number of observed hardware errors, such as CRC errors
, length errors, alignment errors, or dropped frames.

Corrective Action: The errors could be originating from the specified port, a remote port, or a port on another component of
 the network. Check the statistics for both the port and the switch. Contact NetApp technical support for assistance
 and specific instructions.

Source: vifmgr
Sequence#: 143803

Viewing Interface (Port) Statistics

Using the command line in Node Shell, we can view the port statistics, where counters for various types of errors and other data are displayed.

Displaying a Single Port

system node run -node <nodename> -command ifstat <interface>

AFF::> system node run -node AFF-02 -command ifstat e2c

-- interface  e2c  (18 days, 14 hours, 17 minutes, 57 seconds) --

RECEIVE
 Total frames:      890m | Frames/second:     554  | Total bytes:      3354g
 Bytes/second:     2088k | Total errors:     1148  | Errors/minute:       0 
 Total discards:      0  | Discards/minute:     0  | Multi/broadcast:  1515k
 Non-primary u/c:     0  | Errored frames:      0  | Unsupported Op:      0 
 CRC errors:        534  | Runt frames:         0  | Fragment:            0 
 Long frames:        43  | Jabber:              0  | Length errors:      37 
 Alignment errors:    0  | No buffer:           0  | Pause:               0 
 Jumbo:             411m | Error symbol:      534  | Bus overruns:        0 
 Queue drops:         0  | LRO segments:      737m | LRO bytes:        3342g
 LRO6 segments:       0  | LRO6 bytes:          0  | Bad UDP cksum:       0 
 Bad UDP6 cksum:      0  | Bad TCP cksum:       0  | Bad TCP6 cksum:      0 
 Mcast v6 solicit:    0  | Lagg errors:         0  | Lacp errors:         0 
 Lacp PDU errors:     0 
TRANSMIT
 Total frames:     1041m | Frames/second:     648  | Total bytes:      6336g
 Bytes/second:     3943k | Total errors:        0  | Errors/minute:       0 
 Total discards:      0  | Queue overflow:      0  | Multi/broadcast:   107k
 Collisions:          0  | Pause:               0  | Jumbo:             760m
 Cfg Up to Downs:     0  | TSO segments:      101m | TSO bytes:        5792g
 TSO6 segments:       0  | TSO6 bytes:          0  | HW UDP cksums:       0 
 HW UDP6 cksums:      0  | HW TCP cksums:       0  | HW TCP6 cksums:      0 
 Mcast v6 solicit:    0  | Lagg drops:          0  | Lagg no buffer:      0 
 Lagg no entries:     0 
DEVICE
 Mcast addresses:     3  | Rx MBuf Sz:       9216 
LINK INFO
 Speed:           10000M | Duplex:            full | Flowcontrol:      full
 Media state:     active | Up to downs:          2 | HW assist:        5655

Here the total number of errors for the given period is shown, and then a breakdown of the different types of errors. The errors recorded here are CRC errors, Long frames, Error symbol, and Length errors. Other possible errors include Alignment errors.

Displaying All Ports

We can display the statistics for all ports at once.

system node run -node <nodename> -command ifstat -a

Clearing Port Statistics

To more easily monitor statistics after a change, we can clear the counter on the port.

system node run -node <nodename> -command ifstat -z <interface>

AFF::> system node run -node AFF-02 -command ifstat -z e2c
-- interface  e2c  (23 days, 14 hours, 10 minutes, 55 seconds) --

Possible Causes of Port Errors

Probably the first step is to check the active components (switches), where errors on the ports should also be displayed in many cases. This could help identify the port where the errors are coming from. More complex are situations where there are no errors here. Common are checks of cabling, SFP modules, etc. Another option is to verify the MTU on the elements in the (SAN) network.

Later, I was able to find a number of articles in the NetApp KB that suggest various options and causes of errors.

Different Flowcontrol Setting on Array and Switch

The first article describes that CRC errors appear when replacing controllers. But this is not as important as the mention that it is important that Flowcontrol is set the same on the NetApp node ports and the switch ports where they are connected (generally throughout the network). The previous command to display the port statistics also shows the Flowcontrol setting. It can be Flowcontrol: full, which has been the default value for NetApp for some time. Or Flowcontrol: none.

I had never dealt with this before. I looked at the switches, which are Cisco Nexus for SAN and Cisco Catalyst for LAN, and on both flow-control is disabled.

iSCSI1# sh int Eth1/50/1 | inc flow
  Input flow-control is off, output flow-control is off

LAN1#sh int Gi1/0/47 | inc flow
  input flow-control is off, output flow-control is unsupported

The other articles mentioned discuss various opinions on whether it is better to have Flowcontrol enabled or disabled. But the main thing is that it should be set the same throughout the network. Therefore, we can disable it on NetApp. This will cause a reset of the port, i.e., a downtime. But we should definitely have redundancy, so that shouldn't be a problem.

net port modify -node <node that owns port> -port <port> -flowcontrol-admin none

AFF::> network port modify -node AFF-01 -port e2c -flowcontrol-admin none

Warning: This command will cause a several second interruption of service on this network
         port.
Do you want to continue? {y|n}: y

CRC Errors - Component Failure

CRC errors are media errors. They can be caused by a faulty cable or SFP module. They can also be propagated from the network. We need to check the connection between the port with the error and the next connected device. Check the port itself. Replace the SFP.

Long Frames - Large MTU

If we see Long frames in the port statistics, it means that frames are arriving with a larger Maximum Transfer Unit (MTU) than is set on the given port. We need to go through the servers that are connecting to the array and see if they have a larger value set.

Error Symbol - Component Failure

If Error symbol appears in the statistics, NetApp indicates that this is a hardware component failure. The error occurs during transmission from a physically connected device. It cannot be propagated from the network. We need to check the network card and SFP on the NetApp, on the connected device (switch), the connecting cable, and the proper cable connection.

Length Errors

The first description is related only to certain types of interfaces or cards (X1143A). But perhaps it can be used that a small number of these errors can be ignored. Another article mentions an incompatible twinax cable.

Author:

Related articles:

NetApp ONTAP

Articles that relate to NetApp All Flash FAS (AFF) and Fabric-Attached Storage (FAS) disk arrays with the ONTAP operating system.

Computer Storage

Data storage is a vast and complex issue in the computer world. Here you will find articles dedicated to Storage Area Networks (SAN), iSCSI technologies, Fiber Channel, disk arrays (Storage System, Disk Srray) and data storage and storage in general.

If you want write something about this article use comments.

Comments

There are no comments yet.

Add comment

Insert tag: strong em link

Help:
  • maximum length of comment is 2000 characters
  • HTML tags are not allowed (they will be removed), you can use only the special tags listed above the input field
  • new line (ENTER) ends paragraph and start new one
  • when you respond to a comment, put the original comment number in squar brackets at the beginning of the paragraph (line)