ESXi hosts that are prepared for NSX 6.4.5 or 6.4.6 display a Purple Screen of Death (PSOD) diagnostic screen when the virtual infrastructure latency feature is enabled in vRNI 4.2 or later.
Problem
PSOD diagnostic screen is displayed when the number of BFD tunnels exceed 900.
Cause
The virtual infrastructure latency feature in vRNI uses BFD monitoring on NSX-prepared hosts to establish tunnels between hosts. PSOD occurs when the NSX kernel module maintains the state of the BFD sessions while responding to a detailed BFD tunnel query from the control plane agent.
PSOD is not observed when the number of BFD tunnels are in a few hundreds. When the number of BFD tunnels exceed 900, the host experiences a critical error and becomes inoperative. The number of hosts that will create over 900 BFD tunnels depends on the number of VTEPs in your environment.
To determine the number of BFD tunnels in your environment, use the following formula: (N-1)*(T^2)
- N is the number of hosts.
- T is the number of VTEPs per host.
For example, in a cluster of four hosts with two VTEPs each, the number of BFD tunnels that each host can see is:
(4-1)*(2^2)=12
#0 DLM_free (msp=0x431a455dcca0, mem=mem@entry=0x431a458cbd10, allowTrim=allowTrim@entry=1 '\001') at bora/vmkernel/main/dlmalloc.c:4924 #1 0x0000418012343ffa in Heap_Free (heap=0x431a455dc000, mem=<optimized out>, mem@entry=0x431a458cbd10) at bora/vmkernel/main/heap.c:4314 #2 0x000041801222db25 in vmk_HeapFree (heap=<optimized out>, mem=mem@entry=0x431a458cbd10) at bora/vmkernel/core/vmkapi_heap.c:250 #3 0x000041801393ca61 in __VDL2_Free (heapID=<optimized out>, data=data@entry=0x431a458cbd10) at /build/mts/release/bora-13168956/esx-datapath/modules/vdl2/vdl2.c:152 #4 0x0000418013950caf in VDL2_CPTaskFree (task=task@entry=0x431a458cbd10) at /build/mts/release/bora-13168956/esx-datapath/modules/vdl2/vdl2_ctlplane.c:164 #5 0x0000418013949415 in VDL2CPWorldProcessTask (task=0x431a458cbd10) at /build/mts/release/bora-13168956/esx-datapath/modules/vdl2/vdl2_cpworld.c:283 #6 VDL2CPWorldFunc (data=data@entry=0x0) at /build/mts/release/bora-13168956/esx-datapath/modules/vdl2/vdl2_cpworld.c:335 #7 0x0000418012308adf in vmkWorldFunc (data=<optimized out>) at bora/vmkernel/main/vmkapi_world.c:528 #8 0x00004180124c91f5 in CpuSched_StartWorld (destWorld=<optimized out>, previous=<optimized out>) at bora/vmkernel/sched/cpusched.c:10792 #9 0x0000000000000000 in ?? ()
# cpu75:68603 opID=6616a61a)vxlan: VDL2PortsetPropSet:1036: Updating BFD VTEP config to : enable # cpu75:68603 opID=6616a61a)BFD: BFD_CreateNewSession ENTER: localIP: a.b.c.d , remoteIP: w.x.y.z , probeInterval (in milli seconds): 12000 # cpu75:68603 opID=6616a61a)WARNING: BFD: Inserted new session: Discriminator 1471713223, localIP: a.b.c.d remoteIP: w.x.y.z
less vmkernel-zdump.1 vers:1 diag:"No Diagnostic" state:up mult:3 length:24 flags: pol my_disc:0x50c322ca your_disc:0x39f2436f min_tx:300000us (300ms) min_rx:12000000us (12000ms) min_rx_echo:0us (0ms)(null): BFD state change: init->up "No Diagnostic"->"No Diagnostic".(null): New remote min_rx. vers:1 diag:"No Diagnostic" state:up mult:3 length:24 flags: pol my_disc:0x5a566ae8 your_disc:0x16f3890c min_tx:300000us (300ms) min_rx:12000000us (12000ms) min_rx_echo:0us (0ms)(null): BFD state change: init->up "No Diagnostic"- >"No Diagnostic".(null): New remote min_rx.