Re: [PATCH 2/2] mm: ksm: Support hwpoison for ksm page

From: HORIGUCHI NAOYA(堀口 直也)
Date: Fri Mar 31 2023 - 01:42:51 EST


On Thu, Mar 30, 2023 at 03:45:01PM +0800, Longlong Xia wrote:
> hwpoison_user_mappings() is updated to support ksm pages, and add
> collect_procs_ksm() to collect processes when the error hit an ksm
> page. The difference from collect_procs_anon() is that it also needs
> to traverse the rmap-item list on the stable node of the ksm page.
> At the same time, add_to_kill_ksm() is added to handle ksm pages. And
> task_in_to_kill_list() is added to avoid duplicate addition of tsk to
> the to_kill list. This is because when scanning the list, if the pages
> that make up the ksm page all come from the same process, they may be
> added repeatedly.
>
> Signed-off-by: Longlong Xia <xialonglong1@xxxxxxxxxx>

I don't find any specific issue by code review for now, so I'll try to
test your patches.

I have one comment about duplicated KSM pages. It seems that KSM controls
page duplication by limiting deduplication factor with max_page_sharing,
primarily for performance reason. But I think it's imporant from memory
RAS's viewpoint too because that means we could allow recovery from memory
errors on a KSM page by making affected processes to switch to the duplicated
pages (without killing the processes!). Maybe this might be beyond the scope
of this patchset and I'm not sure how hard it is, but if you are interested
in this issue, that's really nice.

Thanks,
Naoya Horiguchi