Re: [EXT] [PATCH 09/15] qed: use new module_firmware_crashed()
From: Igor Russkikh
Date: Sat May 09 2020 - 02:34:39 EST
> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
>
> Using a taint flag allows us to annotate when this happens clearly.
>
> Cc: Ariel Elior <aelior@xxxxxxxxxxx>
> Cc: GR-everest-linux-l2@xxxxxxxxxxx
> Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> ---
> drivers/net/ethernet/qlogic/qed/qed_debug.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c
> b/drivers/net/ethernet/qlogic/qed/qed_debug.c
> index f4eebaabb6d0..9cc6287b889b 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_debug.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c
> @@ -7854,6 +7854,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void
> *buffer)
> REGDUMP_HEADER_SIZE,
> &feature_size);
> if (!rc) {
> + module_firmware_crashed();
> *(u32 *)((u8 *)buffer + offset) =
> qed_calc_regdump_header(cdev,
> PROTECTION_OVERRIDE,
> cur_engine,
> @@ -7869,6 +7870,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void
> *buffer)
> rc = qed_dbg_fw_asserts(cdev, (u8 *)buffer + offset +
> REGDUMP_HEADER_SIZE,
> &feature_size);
> if (!rc) {
> + module_firmware_crashed();
> *(u32 *)((u8 *)buffer + offset) =
> qed_calc_regdump_header(cdev, FW_ASSERTS,
> cur_engine,
> feature_size,
> @@ -7906,6 +7908,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void
> *buffer)
> rc = qed_dbg_grc(cdev, (u8 *)buffer + offset +
> REGDUMP_HEADER_SIZE, &feature_size);
> if (!rc) {
> + module_firmware_crashed();
> *(u32 *)((u8 *)buffer + offset) =
> qed_calc_regdump_header(cdev, GRC_DUMP,
> cur_engine,
Hi Luis,
qed_dbg_all_data is being used to gather debug dump from device. Failures
inside it may happen due to various reasons, but they normally do not indicate
FW failure.
So I think its not a good place to insert this call.
Its hard to find exact good place to insert it in qed.
One more thing is that AFAIU taint flag gets permanent on kernel, but for
example our device can recover itself from some FW crashes, thus it'd be
transparent for user.
Whats the logical purpose of module_firmware_crashed? Does it mean fatal
unrecoverable error on device?
Thanks,
Igor