Re: [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address

From: Jue Wang
Date: Mon Apr 19 2021 - 22:03:16 EST


On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote:

> This patch suggests to do page table walk to find the error virtual
> address. If we find multiple virtual addresses in walking, we now can't
> determine which one is correct, so we fall back to sending SIGBUS in
> kill_me_maybe() without error info as we do now. This corner case needs
> to be solved in the future.

Instead of walking the page tables, I wonder what about the following idea:

When failing to get vaddr, memory_failure just ensures the mapping is removed
and an hwpoisoned swap pte is put in place; or the original page is flagged with
PG_HWPOISONED and kept in the radix tree (e.g., for SHMEM THP).

NOTE: no SIGBUS is sent to user space.

Then do_machine_check just returns to user space to resume execution, the
re-execution will result in a #PF and should land to the exact page fault
handling code that generates a SIGBUS with the precise vaddr info:

https://github.com/torvalds/linux/blob/7af08140979a6e7e12b78c93b8625c8d25b084e2/mm/memory.c#L3290
https://github.com/torvalds/linux/blob/7af08140979a6e7e12b78c93b8625c8d25b084e2/mm/memory.c#L3647

Thanks,
-Jue