Re: [PATCH v2] mm, hwpoison: Try to recover from copy-on write faults

From: Shuai Xue
Date: Fri Oct 21 2022 - 05:30:17 EST




在 2022/10/21 PM12:41, Luck, Tony 写道:
>>> When we do return to user mode the task is going to be busy servicing
>>> a SIGBUS ... so shouldn't try to touch the poison page before the
>>> memory_failure() called by the worker thread cleans things up.
>>
>> What about an RT process on a busy system?
>> The worker threads are pretty low priority.
>
> Most tasks don't have a SIGBUS handler ... so they just die without possibility of accessing poison
>
> If this task DOES have a SIGBUS handler, and that for some bizarre reason just does a "return"
> so the task jumps back to the instruction that cause the COW then there is a 63/64
> likelihood that it is touching a different cache line from the poisoned one.
>
> In the 1/64 case ... its probably a simple store (since there was a COW, we know it was trying to
> modify the page) ... so won't generate another machine check (those only happen for reads).
>
> But maybe it is some RMW instruction ... then, if all the above options didn't happen ... we
> could get another machine check from the same address. But then we just follow the usual
> recovery path.
>
> -Tony


Let assume the instruction that cause the COW is in the 63/64 case, aka,
it is writing a different cache line from the poisoned one. But the new_page
allocated in COW is dropped right? So might page fault again?

Best Regards,
Shuai