RE: [PATCH 3/3] EDAC/igen6: Add polling support

From: Zhuo, Qiuxu
Date: Mon Nov 04 2024 - 21:35:40 EST


> From: Borislav Petkov <bp@xxxxxxxxx>
> [...]
> On Mon, Nov 04, 2024 at 12:40:54PM +0000, Orange Kao wrote:
> > +module_param(edac_op_state, int, 0444);
> > +MODULE_PARM_DESC(edac_op_state, "EDAC Error Reporting state: 0=Poll,
> > +Others or default=Auto detect");
>
> Why is this module parameter here instead of detecting those broken
> machines and enabling polling on them by default and automatically?

Good suggestion. Thanks, Boris.

@Orange Kao,
As per Boris' suggestion, set the default to polling mode for those broken machines
to offload the burden from userspace.

1) A small update to your current patch, as shown below for your reference.

static void opstate_set(struct res_config *cfg, const struct pci_device_id *ent)
{
/*
* Quirk: Certain SoCs' error reporting interrupts don't work.
* Force polling mode for them to ensure that memory error
* events can be handled.
*/
if (ent->device == DID_ADL_N_SKU4) {
edac_op_state = EDAC_OPSTATE_POLL;
return;
}

/* Set the mode according to the configuration data. */
if (cfg->machine_check)
edac_op_state = EDAC_OPSTATE_INT;
else
edac_op_state = EDAC_OPSTATE_NMI;
}

2) The call site is updated accordingly:
...
opstate_set(res_cfg, ent);
...

3) Also, the following 2 lines are no longer needed in this patch.

module_param(edac_op_state, int, 0444);
MODULE_PARM_DESC(edac_op_state, "EDAC Error Reporting state: 0=Poll, Others or default=Auto detect");

Could you try it and help resend a new version of this patch?
Or any questions please feel free to let me know.
Thanks!

-Qiuxu