Re: [PATCH 2/3] x86/mce: Avoid infinite loop for copy from user recovery

From: Luck, Tony
Date: Thu Jul 22 2021 - 20:14:46 EST


On Thu, Jul 22, 2021 at 04:30:44PM -0700, Jue Wang wrote:
> I think the challenge being the uncorrectable errors are essentially
> random. It's
> just a matter of time for >1 UC errors to show up in sequential kernel accesses.
>
> It's easy to create such cases with artificial error injections.
>
> I suspect we want to design this part of the kernel to be able to handle generic
> cases?

Remember that:
1) These errors are all in application memory
2) We reset the count every time we get into the task_work function that
will return to user

So the multiple error scenario here is one where we hit errors
on different user pages on a single trip into the kernel.

Hitting the same page is easy. The kernel has places where it
can hit poison with page faults disabled, and it then enables
page faults and retries the same access, and hits poison again.

I'm not aware of, nor expecting to find, places where the kernel
tries to access user address A and hits poison, and then tries to
access user address B (without returrning to user between access
A and access B).

-Tony