
VMWare ESXi 7 NFS Datastores blocked by Switch denial-of-service controls
A recent experience demonstrated how Denial of Service countermeasures in network switches can prevent VMWare ESXi NFS datastores working.
New switches
I recently upgraded the network switches in my home lab. By default, my new switches have a denial of service prevention feature that detect the following behaviours, which I had no reason to disable:
- Land Attack
- Blat Attack
- TCP Null Scan
- TCP Xmascan
- TCP SYN-FIN
- TCP SYN Src Port Less 1024
- Ping Death Attack
New VMWare ESXi NFS Datastore
With my improved switching infrastructure in place, I had confidence to mount production NFS datastores on my VMWare ESXi 7 host. However, an attempt to create an NFS datastore failed with the following error message:
[root@esxiserver:~] esxcli storage nfs add -H NasServerDevice.domain -s /volume1/NasShare -v NasServerDevice-NasShare
Unable to complete Sysinfo operation. Please see the VMkernel log file for more details.: Unable to connect to NFS server
Initial investigative steps
I confirmed the NFS server was up, which it was:
[root@esxiserver:~] ping NasServerDevice
PING NasServerDevice (192.168.x.y): 56 data bytes
64 bytes from 192.168.x.y: icmp_seq=0 ttl=64 time=0.568 ms
64 bytes from 192.168.x.y: icmp_seq=1 ttl=64 time=0.295 ms
I looked in the vmkernel log:
/var/log/vmkernel.log
2022-05-16T09:50:59.535Z cpu1:133089 opID=fb98e9c7)NFS: 162: Command: (mount) Server: (NasServerDevice.domain) IP: (192.168.x.y) Path: (/volume1/NasShare) Label: (NasServerDevice-NasShare) Options: (None)
2022-05-16T09:50:59.535Z cpu1:133089 opID=fb98e9c7)StorageApdHandler: 976: APD Handle 3505ff2b-e9cc43e0 Created with lock[StorageApd-0x4313da0028b0]
2022-05-16T09:50:59.535Z cpu1:133089 opID=fb98e9c7)CpuSched: 815: user latency of 133666 RPC-tx-192.168.x.y.0.111 0 changed by 133089 hostd-worker -6
2022-05-16T09:51:10.008Z cpu1:133089 opID=fb98e9c7)SunRPC: 3306: Synchronous RPC abort for client 0x43052e6012b0 IP 192.168.x.y.0.111 proc 3 xid 0x1395fb55 attempt 1 of 3
2022-05-16T09:51:21.007Z cpu1:133089 opID=fb98e9c7)SunRPC: 3306: Synchronous RPC abort for client 0x43052e6012b0 IP 192.168.x.y.0.111 proc 3 xid 0x1395fb55 attempt 2 of 3
2022-05-16T09:51:31.007Z cpu1:133089 opID=fb98e9c7)SunRPC: 3306: Synchronous RPC abort for client 0x43052e6012b0 IP 192.168.x.y.0.111 proc 3 xid 0x1395fb55 attempt 3 of 3
2022-05-16T09:51:31.007Z cpu1:133089 opID=fb98e9c7)SunRPC: 1104: Destroying world 0x20a22
2022-05-16T09:51:31.008Z cpu1:133089 opID=fb98e9c7)StorageApdHandler: 1062: Freeing APD handle 0x4313da0028b0 [3505ff2b-e9cc43e0]
2022-05-16T09:51:31.008Z cpu1:133089 opID=fb98e9c7)StorageApdHandler: 1146: APD Handle freed!
2022-05-16T09:51:31.008Z cpu1:133089 opID=fb98e9c7)NFS: 173: NFS mount NasServerDevice.domain:/volume1/NasShare failed: Unable to connect to NFS server.
And I looked in the vmkwarning log:
/var/log/vmkwarning.log
2022-05-16T09:11:32.194Z cpu3:1496724)WARNING: NFS: 2363: Failed to get attributes (No connection)
2022-05-16T09:11:32.194Z cpu3:1496724)WARNING: NFS: 2363: Failed to get attributes (No connection)
2022-05-16T09:11:32.194Z cpu3:1496724)WARNING: NFS: 2363: Failed to get attributes (No connection)
2022-05-16T09:11:32.194Z cpu3:1496724)WARNING: NFS: 2363: Failed to get attributes (No connection)
These logs described the failure to mount the datastore in only the most general terms.
Further considerations
I suspected the problem to be due to my new switches as they were the only recent change to my otherwise static lab environment. The problem did not occur with my old switches. In particular, I suspected the denial-of-service controls due to their potential to block frames.
Proof, resolution and future actions
Disabling the switch denial-of-service controls fixed the problem. I was then able to create the NFS datastore on my VMWare ESXi 7 host. In the future I may investigate which of the specific control(s) causes the problem, probably starting with TCP SYN Src Port Less 1024, then TCP SYN-FIN.