RE: [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in ghes_edac_register()

From: Shiju Jose
Date: Thu Aug 27 2020 - 10:41:56 EST


Hello Boris,

Thanks for reviewing.

>-----Original Message-----
>From: linux-edac-owner@xxxxxxxxxxxxxxx [mailto:linux-edac-
>owner@xxxxxxxxxxxxxxx] On Behalf Of Borislav Petkov
>Sent: 26 August 2020 09:52
>To: Shiju Jose <shiju.jose@xxxxxxxxxx>
>Cc: linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>mchehab@xxxxxxxxxx; tony.luck@xxxxxxxxx; james.morse@xxxxxxx;
>rrichter@xxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>
>Subject: Re: [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in
>ghes_edac_register()
>
>On Tue, Aug 25, 2020 at 02:01:08PM +0100, Shiju Jose wrote:
>> After the 'commit b9cae27728d1 ("EDAC/ghes: Scan the system once on
>driver init")'
>> applied, following error has occurred in ghes_edac_register() when
>> CONFIG_DEBUG_TEST_DRIVER_REMOVE is enabled. The null
>ghes_hw.dimms
>> pointer in the mci_for_each_dimm() of ghes_edac_register() caused the
>error.
>>
>> The error occurs when all the previously initialized ghes instances
>> are removed and then probe a new ghes instance. In this case, the
>> ghes_refcount would be 0, ghes_hw.dimms and mci already freed. The
>> ghes_hw.dimms would be null because ghes_scan_system() would not call
>enumerate_dimms() again.
>
>Try the below instead and see if it fixes the issue for you too.
>
>If it does, pls send it as v2 but do not add the splat to the commit message -
>that's a lot of noise for something which is clear why it happens and you
>explain it properly in text anyway.

I tested with your changes and it fixes the issue. I will send v2.

>
>Thx.
>
>---
>diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index
>da60c29468a7..54ebc8afc6b1 100644
>--- a/drivers/edac/ghes_edac.c
>+++ b/drivers/edac/ghes_edac.c
>@@ -55,6 +55,8 @@ static DEFINE_SPINLOCK(ghes_lock); static bool
>__read_mostly force_load; module_param(force_load, bool, 0);
>
>+static bool system_scanned;
>+
> /* Memory Device - Type 17 of SMBIOS spec */ struct memdev_dmi_entry {
> u8 type;
>@@ -225,14 +227,12 @@ static void enumerate_dimms(const struct
>dmi_header *dh, void *arg)
>
> static void ghes_scan_system(void)
> {
>- static bool scanned;
>-
>- if (scanned)
>+ if (system_scanned)
> return;
>
> dmi_walk(enumerate_dimms, &ghes_hw);
>
>- scanned = true;
>+ system_scanned = true;
> }
>
> void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err
>*mem_err) @@ -631,6 +631,8 @@ void ghes_edac_unregister(struct ghes
>*ghes)
>
> mutex_lock(&ghes_reg_mutex);
>
>+ system_scanned = false;
>+
> if (!refcount_dec_and_test(&ghes_refcount))
> goto unlock;
>
>
>--
>Regards/Gruss,
> Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette

Thanks,
Shiju