[PATCH v3 0/8] kernel: taint when the driver firmware crashes

From: Luis Chamberlain
Date: Tue May 26 2020 - 10:59:09 EST


To those new on CC -- this is intended to be a simple generic interface
to the kernel to annotate when the firwmare has crashed leaving the
driver or system in a questionable state, in the worst case requiring
full system reboot. This series is first addressing only a few
networking patches, however, I already have an idea of where such
firmware crashes happen across the tree. The goal with this series then
is to first introduce the simple framework, and only if that moves
forward will I continue to chug on with the rest of the drivers /
subsystems.

This is *not* a networking specific problem only.

This v3 augments the last series by introducing the uevent for panic
events, one of them is during tainting. The uvent mechanism is
independent from any of this firmware taint mechanism. I've also
addressed Jessica Yu's feedback. Given I've extended the patches a bit
with other minor cleanup which checkpatch.pl complains over, and since
this infrastructure is still being discussed, I've trimmed the patch
series size to only cover drivers for which I've received an Acked-by
from the respective driver maintainer, or where we have bug reports to
support such dire situations on the driver such as ath10k.

During the last v2 it was discussed that we should instead use devlink
for this work, however the initial RFC patches produced by Jakub
Kicinski [0] shows how devlink is networking specific, and the intent
behind this series is to produce simple helpers which can be used by *any*
device driver, for any subsystem, not just networking. Subsystem
specific infrastructure to help address firwmare crashes may still make
sense, however that does not mean we *don't* need something even more
generic regardless of the subsystem the issue happens on. Since uevents
for taints are exposed, we now expose these through uapi as well, and
that was something which eventually had to happen given that the current
scheme of relying on sensible character representations for each taint
will not scale beyond the alphabet.

This series is avaialble my 20200526-taint-firmware-net-intro branch, based on
linux-next tag next-20200526 [1].

[0] https://lkml.kernel.org/r/20200519211531.3702593-1-kuba@xxxxxxxxxx
[1] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20200526-taint-firmware-net-intro

Luis Chamberlain (7):
kernel.h: move taint and system state flags to uapi
panic: add uevent support
taint: add firmware crash taint support
panic: make taint data type clearer
ath10k: use new taint_firmware_crashed()
liquidio: use new taint_firmware_crashed()
qed: use new taint_firmware_crashed()

Vasundhara Volam (1):
bnxt_en: use new taint_firmware_crashed()

Documentation/admin-guide/tainted-kernels.rst | 6 +
MAINTAINERS | 8 +
.../net/ethernet/broadcom/bnxt/bnxt_devlink.c | 1 +
.../net/ethernet/cavium/liquidio/lio_main.c | 1 +
drivers/net/ethernet/qlogic/qed/qed_mcp.c | 1 +
drivers/net/wireless/ath/ath10k/pci.c | 2 +
drivers/net/wireless/ath/ath10k/sdio.c | 2 +
drivers/net/wireless/ath/ath10k/snoc.c | 1 +
include/asm-generic/bug.h | 4 +-
include/linux/kernel.h | 40 +--
include/linux/module.h | 13 +
include/linux/panic_events.h | 26 ++
include/trace/events/module.h | 3 +-
include/uapi/linux/kernel.h | 36 +++
include/uapi/linux/panic_events.h | 17 ++
init/main.c | 1 +
kernel/Makefile | 1 +
kernel/module.c | 13 +-
kernel/panic.c | 16 +-
kernel/panic_events.c | 289 ++++++++++++++++++
lib/Kconfig.debug | 13 +
tools/debugging/kernel-chktaint | 7 +
22 files changed, 454 insertions(+), 47 deletions(-)
create mode 100644 include/linux/panic_events.h
create mode 100644 include/uapi/linux/panic_events.h
create mode 100644 kernel/panic_events.c

--
2.26.2