Re: [PATCH -next v2] mm: hwposion: support recovery from ksm_might_need_to_copy()

From: Kefeng Wang
Date: Fri Dec 09 2022 - 22:38:04 EST



On 2022/12/10 8:50, Andrew Morton wrote:
On Fri, 9 Dec 2022 15:28:01 +0800 Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote:

When the kernel copy a page from ksm_might_need_to_copy(), but runs
into an uncorrectable error, it will crash since poisoned page is
consumed by kernel, this is similar to Copy-on-write poison recovery,
When an error is detected during the page copy, return VM_FAULT_HWPOISON,
which help us to avoid system crash. Note, memory failure on a KSM
page will be skipped, but still call memory_failure_queue() to be
consistent with general memory failure process.
Thanks. Sorry, lots of paperwork and bureaucracy:


Is a copy of the oops(?) output available?

Did someone else report this? If so, is a Reported-by available for
that? And a Link: for the Reported-by:, which is a coming thing.

Can we identify a Fixes: target?

Is a cc:stable appropriate?
We are trying to support ARCH_HAS_COPY_MC on arm64[1] and trying to recover from CoW faults[2],
also tony do the same thing(recover from CoW) on X86[3]. The kernel copy in ksm_might_need_to_copy()
could recover, this is an enhance of COPY_MC, so I think no need to add Fixes and stable.

Thanks.

[1] https://lore.kernel.org/linux-arm-kernel/20220812070557.1028499-1-tongtiangen@xxxxxxxxxx/
[2] https://lore.kernel.org/linux-arm-kernel/20220812070557.1028499-5-tongtiangen@xxxxxxxxxx/
[3] https://lore.kernel.org/lkml/20221031201029.102123-2-tony.luck@xxxxxxxxx/