[PATCH v3] ACPI: APEI: Skip initialization of GHES_ASSIST structures for Machine Check Architecture

From: Naik, Avadhut
Date: Thu Feb 01 2024 - 15:45:52 EST


Thank you, Tony!

Hi Rafael,

Can this patch be merged in? Or would you prefer me resending it
with Tony's "Reviewed-by:" tag?

Thanks,
Avadhut Naik

On 1/23/2024 15:53, Tony Luck wrote:
> On Tue, Jan 23, 2024 at 03:39:49PM -0600, Naik, Avadhut wrote:
>> Hi,
>>
>> Any further comments on this patch?
>
> No. I like the comments you added to address my earlier
> confusion/concerns.
>
> Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
>
> -Tony
>
>>
>> On 12/18/2023 11:13 AM, Avadhut Naik wrote:
>>> Hi,
>>>
>>> Any further feedback on this patch?
>>>
>>> On 12/4/2023 13:25, Avadhut Naik wrote:
>>>> To support GHES_ASSIST on Machine Check Architecture (MCA) error sources,
>>>> a set of GHES structures is provided by the system firmware for each MCA
>>>> error source. Each of these sets consists of a GHES structure for each MCA
>>>> bank on each logical CPU, with all structures of a set sharing a common
>>>> Related Source ID, equal to the Source ID of one of the MCA error source
>>>> structures.[1] On SOCs with large core counts, this typically equates to
>>>> tens of thousands of GHES_ASSIST structures for MCA under
>>>> "/sys/bus/platform/drivers/GHES".
>>>>
>>>> Support for GHES_ASSIST however, hasn't been implemented in the kernel. As
>>>> such, the information provided through these structures is not consumed by
>>>> Linux. Moreover, these GHES_ASSIST structures for MCA, which are supposed
>>>> to provide supplemental information in context of an error reported by
>>>> hardware, are setup as independent error sources by the kernel during HEST
>>>> initialization.
>>>>
>>>> Additionally, if the Type field of the Notification structure, associated
>>>> with these GHES_ASSIST structures for MCA, is set to Polled, the kernel
>>>> sets up a timer for each individual structure. The duration of the timer
>>>> is derived from the Poll Interval field of the Notification structure. On
>>>> SOCs with high core counts, this will result in tens of thousands of
>>>> timers expiring periodically causing unnecessary preemptions and wastage
>>>> of CPU cycles. The problem will particularly intensify if Poll Interval
>>>> duration is not sufficiently high.
>>>>
>>>> Since GHES_ASSIST support is not present in kernel, skip initialization
>>>> of GHES_ASSIST structures for MCA to eliminate their performance impact.
>>>>
>>>> [1] ACPI specification 6.5, section 18.7
>>>>
>>>> Signed-off-by: Avadhut Naik <avadhut.naik@xxxxxxx>
>>>> Reviewed-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
>>>> ---
>>>> Changes in v2:
>>>> 1. Since is_ghes_assist_struct() returns if any of the conditions is hit
>>>> if-else-if chain is redundant. Replace it with just if statements.
>>>> 2. Fix formatting errors.
>>>> 3. Add Reviewed-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
>>>>
>>>> Changes in v3:
>>>> 1. Modify structure (mces) comment, per Tony's recommendation, to better
>>>> reflect the structure's usage.
>>>> ---
>>>> drivers/acpi/apei/hest.c | 51 ++++++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 51 insertions(+)
>>>>
>>>> diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
>>>> index 6aef1ee5e1bd..20d757687e3d 100644
>>>> --- a/drivers/acpi/apei/hest.c
>>>> +++ b/drivers/acpi/apei/hest.c
>>>> @@ -37,6 +37,20 @@ EXPORT_SYMBOL_GPL(hest_disable);
>>>>
>>>> static struct acpi_table_hest *__read_mostly hest_tab;
>>>>
>>>> +/*
>>>> + * Since GHES_ASSIST is not supported, skip initialization of GHES_ASSIST
>>>> + * structures for MCA.
>>>> + * During HEST parsing, detected MCA error sources are cached from early
>>>> + * table entries so that the Flags and Source Id fields from these cached
>>>> + * values are then referred to in later table entries to determine if the
>>>> + * encountered GHES_ASSIST structure should be initialized.
>>>> + */
>>>> +static struct {
>>>> + struct acpi_hest_ia_corrected *cmc;
>>>> + struct acpi_hest_ia_machine_check *mc;
>>>> + struct acpi_hest_ia_deferred_check *dmc;
>>>> +} mces;
>>>> +
>>>> static const int hest_esrc_len_tab[ACPI_HEST_TYPE_RESERVED] = {
>>>> [ACPI_HEST_TYPE_IA32_CHECK] = -1, /* need further calculation */
>>>> [ACPI_HEST_TYPE_IA32_CORRECTED_CHECK] = -1,
>>>> @@ -70,22 +84,54 @@ static int hest_esrc_len(struct acpi_hest_header *hest_hdr)
>>>> cmc = (struct acpi_hest_ia_corrected *)hest_hdr;
>>>> len = sizeof(*cmc) + cmc->num_hardware_banks *
>>>> sizeof(struct acpi_hest_ia_error_bank);
>>>> + mces.cmc = cmc;
>>>> } else if (hest_type == ACPI_HEST_TYPE_IA32_CHECK) {
>>>> struct acpi_hest_ia_machine_check *mc;
>>>> mc = (struct acpi_hest_ia_machine_check *)hest_hdr;
>>>> len = sizeof(*mc) + mc->num_hardware_banks *
>>>> sizeof(struct acpi_hest_ia_error_bank);
>>>> + mces.mc = mc;
>>>> } else if (hest_type == ACPI_HEST_TYPE_IA32_DEFERRED_CHECK) {
>>>> struct acpi_hest_ia_deferred_check *mc;
>>>> mc = (struct acpi_hest_ia_deferred_check *)hest_hdr;
>>>> len = sizeof(*mc) + mc->num_hardware_banks *
>>>> sizeof(struct acpi_hest_ia_error_bank);
>>>> + mces.dmc = mc;
>>>> }
>>>> BUG_ON(len == -1);
>>>>
>>>> return len;
>>>> };
>>>>
>>>> +/*
>>>> + * GHES and GHESv2 structures share the same format, starting from
>>>> + * Source Id and ending in Error Status Block Length (inclusive).
>>>> + */
>>>> +static bool is_ghes_assist_struct(struct acpi_hest_header *hest_hdr)
>>>> +{
>>>> + struct acpi_hest_generic *ghes;
>>>> + u16 related_source_id;
>>>> +
>>>> + if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR &&
>>>> + hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR_V2)
>>>> + return false;
>>>> +
>>>> + ghes = (struct acpi_hest_generic *)hest_hdr;
>>>> + related_source_id = ghes->related_source_id;
>>>> +
>>>> + if (mces.cmc && mces.cmc->flags & ACPI_HEST_GHES_ASSIST &&
>>>> + related_source_id == mces.cmc->header.source_id)
>>>> + return true;
>>>> + if (mces.mc && mces.mc->flags & ACPI_HEST_GHES_ASSIST &&
>>>> + related_source_id == mces.mc->header.source_id)
>>>> + return true;
>>>> + if (mces.dmc && mces.dmc->flags & ACPI_HEST_GHES_ASSIST &&
>>>> + related_source_id == mces.dmc->header.source_id)
>>>> + return true;
>>>> +
>>>> + return false;
>>>> +}
>>>> +
>>>> typedef int (*apei_hest_func_t)(struct acpi_hest_header *hest_hdr, void *data);
>>>>
>>>> static int apei_hest_parse(apei_hest_func_t func, void *data)
>>>> @@ -114,6 +160,11 @@ static int apei_hest_parse(apei_hest_func_t func, void *data)
>>>> return -EINVAL;
>>>> }
>>>>
>>>> + if (is_ghes_assist_struct(hest_hdr)) {
>>>> + hest_hdr = (void *)hest_hdr + len;
>>>> + continue;
>>>> + }
>>>> +
>>>> rc = func(hest_hdr, data);
>>>> if (rc)
>>>> return rc;
>>>>
>>>> base-commit: 629a3b49f3f957e975253c54846090b8d5ed2e9b
>>>
>>
>> --
>> Thanks,
>> Avadhut Naik

--