Re: [PATCH v4 1/2] ACPI: APEI: Add support to notify the vendor specific HW errors
From: James Morse
Date: Wed Mar 11 2020 - 13:29:38 EST
Hi Shiju,
On 07/02/2020 10:31, Shiju Jose wrote:
> Presently APEI does not support reporting the vendor specific
> HW errors, received in the vendor defined table entries, to the
> vendor drivers for any recovery.
>
> This patch adds the support to register and unregister the
> error handling function for the vendor specific HW errors and
> notify the registered kernel driver.
Is it possible to use the kernel's existing atomic_notifier_chain_register() API for this?
The one thing that can't be done in the same way is the GUID filtering in ghes.c. Each
driver would need to check if the call matched a GUID they knew about, and return
NOTIFY_DONE if they "don't care".
I think this patch would be a lot smaller if it was tweaked to be able to use the existing
API. If there is a reason not to use it, it would be good to know what it is.
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 103acbb..69e18d7 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -490,6 +490,109 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
> +/**
> + * ghes_unregister_event_handler - unregister the previously
> + * registered event handling function.
> + * @sec_type: sec_type of the corresponding CPER.
> + * @data: driver specific data to distinguish devices.
> + */
> +void ghes_unregister_event_handler(guid_t sec_type, void *data)
> +{
> + struct ghes_event_notify *event_notify;
> + bool found = false;
> +
> + mutex_lock(&ghes_event_notify_mutex);
> + rcu_read_lock();
> + list_for_each_entry_rcu(event_notify,
> + &ghes_event_handler_list, list) {
> + if (guid_equal(&event_notify->sec_type, &sec_type)) {
> + if (data != event_notify->data)
It looks like you need multiple drivers to handle the same GUID because of multiple root
ports. Can't the handler lookup the right device?
> + continue;
> + list_del_rcu(&event_notify->list);
> + found = true;
> + break;
> + }
> + }
> + rcu_read_unlock();
> + mutex_unlock(&ghes_event_notify_mutex);
> +
> + if (!found) {
> + pr_err("Tried to unregister a GHES event handler that has not been registered\n");
> + return;
> + }
> +
> + synchronize_rcu();
> + kfree(event_notify);
> +}
> +EXPORT_SYMBOL_GPL(ghes_unregister_event_handler);
> @@ -525,11 +628,14 @@ static void ghes_do_proc(struct ghes *ghes,
>
> log_arm_hw_error(err);
> } else {
> - void *err = acpi_hest_get_payload(gdata);
> -
> - log_non_standard_event(sec_type, fru_id, fru_text,
> - sec_sev, err,
> - gdata->error_data_length);
> + if (!ghes_handle_non_standard_event(sec_type, gdata,
> + sev)) {
> + void *err = acpi_hest_get_payload(gdata);
> +
> + log_non_standard_event(sec_type, fru_id,
> + fru_text, sec_sev, err,
> + gdata->error_data_length);
> + }
So, a side effect of the kernel handling these is they no longer get logged out of trace
points?
I guess the driver the claims this logs some more accurate information. Are there expected
to be any user-space programs doing something useful with B2889FC9... today?
Thanks,
James