Re: [PATCH v2 02/24] EDAC, ghes: Fix grain calculation

From: Borislav Petkov
Date: Fri Aug 09 2019 - 09:15:20 EST


On Mon, Jun 24, 2019 at 03:08:57PM +0000, Robert Richter wrote:
> The conversion from the physical address mask to a grain (defined as
> granularity in bytes) is broken:
>
> e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
>
> E.g., a physical address mask of ~0xfff should give a grain of 0x1000,
> instead the grain is wrong with the upper bits always set. We also
> remove the limitation to the page size as the granularity is unrelated
> to the page size used in the system. We fix this with:
>
> e->grain = ~mem_err->physical_addr_mask + 1;
>
> Note: We need to adopt the grain_bits calculation as e->grain is now a
> power of 2 and no longer a bit mask. The formula is now the same as in
> edac_mc and can later be unified.

Please refrain from using "We" or "I" or etc personal pronouns in a
commit message and in the code comments below.

>From Documentation/process/submitting-patches.rst:

"Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour."

Please fix all your other commit messages for the next submission.

> Signed-off-by: Robert Richter <rrichter@xxxxxxxxxxx>
> ---
> drivers/edac/ghes_edac.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 7f19f1c672c3..d095d98d6a8d 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -222,6 +222,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
> /* Cleans the error report buffer */
> memset(e, 0, sizeof (*e));
> e->error_count = 1;
> + e->grain = 1;
> strcpy(e->label, "unknown label");
> e->msg = pvt->msg;
> e->other_detail = pvt->other_detail;
> @@ -317,7 +318,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
>
> /* Error grain */
> if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
> - e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
> + e->grain = ~mem_err->physical_addr_mask + 1;

This is assuming that that ->physical_addr_mask is contiguous but I
don't trust any firmware. I guess we can leave it like that for now
until some "inventive" firmware actually does it.

>
> /* Memory error location, mapped on e->location */
> p = e->location;
> @@ -433,8 +434,15 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
> if (p > pvt->other_detail)
> *(p - 1) = '\0';
>
> + /*
> + * We expect the hw to report a reasonable grain, fallback to
> + * 1 byte granularity otherwise.
> + */
> + if (WARN_ON_ONCE(!e->grain))

Please move that WARN_ON_ONCE in the

if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)

branch above because you're presetting grain to 1 so the warn should be
close to where it could happen, i.e., when coming from the firmware.

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.