ixgbe: firmware spam on X520-T2 NIC (82599EB)

From: Nick Price
Date: Mon May 18 2020 - 10:06:47 EST


In ixgbe_main.c around line 7882, the call to ixgbe_check_fw_error
causes spammy messages on certain adapters because the fwsm register
returns 0, triggering the !(fwsm & IXGBE_FWSM_FW_VAL_BIT) condition.

This causes, every two seconds, one error message to be emitted per
interface:

[79062.730890] ixgbe 0000:2a:00.0: Warning firmware error detected
FWSM: 0x00000000
[79062.890877] ixgbe 0000:2a:00.1: Warning firmware error detected
FWSM: 0x00000000
[79064.746743] ixgbe 0000:2a:00.0: Warning firmware error detected
FWSM: 0x00000000
[79064.906728] ixgbe 0000:2a:00.1: Warning firmware error detected
FWSM: 0x00000000

Bit 15 of this register is supposed to be set to 1 upon card
initialization per the Intel 82599 datasheet, however, these particular
cards do not behave per their documentation and there are no firmware
updates available from Intel or Dell that resolve this issue (there
have been firmware updates for other models which have resolved this
problem)

Would it make sense to skip the error message if the entire fwsm
register is zero? Or maybe only emit it once?

Or do we just continue to spam because technically this *is* a firmware
error although it does not impact functionality and there is seemingly
no resolution on the vendor side.

Anyone have any thoughts? Some references below.

Thanks!
Nick


For reference:
The commit that added this message is at
https://github.com/torvalds/linux/commit/59dd45d550c518a2c297b2888f194633cb8e5700

More threads on the subject - it seems people are either patching the
kernel to eliminate the check completely or switching to Intel's
driver:
https://bugs.centos.org/view.php?id=16495
https://patchwork.criu.org/patch/11882/
https://forum.proxmox.com/threads/pve-6-0-7-ixgbe-firmware-errors.58592/