Re: [PATCH] x86/mce: Always save severity in machine_check_poll

From: Borislav Petkov
Date: Mon Jun 19 2017 - 12:49:02 EST

On Fri, Jun 16, 2017 at 02:49:58PM +0000, Ghannam, Yazen wrote:
> The code block being removed here was added in the following commit to decide
> whether or not to schedule work.
> fa92c58 x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll

Are you sure?

That commit makes the code log poison memory errors which have a valid
address. The work being scheduled is only an implementation detail for
how we're doing the logging and calling to the userspace handlers to
start consuming.

Btw, do

$ git config --replace-all core.abbrev 12

to set your commit abbreviation length to 12 so that there's no
ambiguity for the recent years. :)

> The following commit based the decision to schedule work on if we have a usable
> address and made this decision later in machine_check_poll().

> 8b38937b x86/mce: Do not enter deferred errors into the generic pool twice

Yes, because we were logging them twice.

> Then the following commit removed m.usable_addr from the code block.
> c0ec382 x86/RAS: Remove mce.usable_addr

That's just a cleanup.

> So now this code block just decides whether or not to save the severity. We can
> remove this block, since the original purpose of this code (to schedule work) is no
> longer happening.

Not really. The original purpose of this code was to log deferred errors
with AddrV set - the work scheduling is just an implementation detail,
as I mentioned above.

> Tony has a concern that some notifiers may assume that the severity being
> set means that the error is a memory error. As far as I can tell, the only notifier
> that uses severity is the SRAO notifier and it doesn't make an assumption.
> We schedule work if we want to log the error or if we have a usable address.
> So there's no reason not to save the severity anymore.

Just forget the work scheduling - it is only a marginal implementation
thing - you only want to say that we want to log the severity since we
already have it. And the most important aspect in your v2 commit message
is explaining *why* we want the severity. And yes, it is possible to
simply say, this is a cleanup and we can just as well always save it
because it is a valid reason. And it makes sense too.

All I'm trying to remind you of, is, you should explain in your commit
message what the issue is and *why* you're doing the change. When the
first sentence of your commit message is "Remove code that was used
to decide whether to schedule work." I'm starting to scratch my head,
wondering, what the actual problem is and what you're trying to fix.



Good mailing practices for 400: avoid top-posting and trim the reply.