Re: [PATCH -next] mm: hwpoison: support recovery from HugePage copy-on-write faults

From: Mike Kravetz
Date: Wed Apr 12 2023 - 18:22:09 EST


On 04/12/23 14:57, Andrew Morton wrote:
> On Wed, 12 Apr 2023 11:13:50 -0700 Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>
> > On 04/11/23 17:27, Liu Shixin wrote:
> > > Patch a873dfe1032a ("mm, hwpoison: try to recover from copy-on write faults")
> > > introduced a new copy_user_highpage_mc() function, and fix the kernel crash
> > > when the kernel is copying a normal page as the result of a copy-on-write
> > > fault and runs into an uncorrectable error. But it doesn't work for HugeTLB.
> >
> > Andrew asked about user-visible effects. Perhaps, a better way of
> > stating this in the commit message might be:
> >
> > Commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on write
> > faults") introduced the routine copy_user_highpage_mc() to gracefully
> > handle copying of user pages with uncorrectable errors. Previously,
> > such copies would result in a kernel crash. hugetlb has separate code
> > paths for copy-on-write and does not benefit from the changes made in
> > commit a873dfe1032a.

I was just going to suggest adding the line,

Hence, copy-on-write of hugetlb user pages with uncorrectable errors
will result in a kernel crash as was the case with 'normal' pages before
commit a873dfe1032a.

However, I'm guessing it might be more clear if we start with the
runtime effects. Something like:

copy-on-write of hugetlb user pages with uncorrectable errors will result
in a kernel crash. This is because the copy is performed in kernel mode
and in general we can not handle accessing memory with such errors while
in kernel mode. Commit a873dfe1032a ("mm, hwpoison: try to recover from
copy-on write faults") introduced the routine copy_user_highpage_mc() to
gracefully handle copying of user pages with uncorrectable errors. However,
the separate hugetlb copy-on-write code paths were not modified as part
of commit a873dfe1032a.

> >
> > Modify hugetlb copy-on-write code paths to use copy_mc_user_highpage()
> > so that they can also gracefully handle uncorrectable errors in user
> > pages. This involves changing the hugetlb specific routine
> > ?copy_user_folio()? from type void to int so that it can return an error.
> > Modify the hugetlb userfaultfd code in the same way so that it can return
> > -EHWPOISON if it encounters an uncorrectable error.
>
> Thanks, but... what are the runtime effects? What does hugetlb
> presently do when encountering these uncorrectable error?

--
Mike Kravetz