Re: [PATCH 5/8] x86/mce: Avoid tail copy when machine check terminated a copy from user

From: Borislav Petkov
Date: Thu Sep 17 2020 - 13:05:02 EST


Replying a bit out of order:

> Both the code and the commit comment need much more of this
> description.

Hell yeah!

On Wed, Sep 16, 2020 at 12:26:59PM -0700, Luck, Tony wrote:
> So we take a another machine check on the same address when
> fault_in_pages_readable() does __get_user(). That ought to break
> us out ... but for some reason I still don't understand didn't.
> But even if it did ... the second machine check is not at all
> a good idea.

And I think this is the important point: for MCEs you absolutely don't
want to take another MCE and even walk into those fields. So what
fault_in_pages_readable() does normally, would be totally wrong. Imagine
you're playing minesweeper - you can't just pre-fault blocks without
counting the mines. :-P

So actually, I'm thinking:

.LMCE_during_user_access:
mov $-ENODEV, %eax
ASM_CLAC
ret

I have no clue which error code we should put there but it should be an
error code which tells you not to retry and to back off immediately.

> Returning zero bytes left to say we completed avoids that. The
> user is guaranteed a SIGBUS when the task_work does fire. So whatever
> system call was in progress is not going to see the apparent
> successful return.

Yes, my only proposition with the error code is in case you're looking
at traces, to recognize that the copying encountered an MCE. In addition
to the "back off immediately" semantics, if there even is such defined
for users of copy_*_user().

> Unless you have some better way out of the dilemmma that the
> real fixup is only scheduled at the point that the extable
> fixup just arranges for a simple local return from the copy.

Right, see above: I think it is imperative *not* to walk into that area
again and not do any retrying.

> When the return to user happens the task_work that was scheduled
> in the machine check handler takes care of the error return to the
> user.

Yeah, let's write that whole flow down somewhere - not in a commit
message - so that it is clear what happens.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette