Re: [PATCH] hugetlbfs: don't delete error page from pagecache
From: Mike Kravetz
Date: Wed Oct 19 2022 - 14:56:01 EST
On 10/19/22 11:31, Yang Shi wrote:
> On Tue, Oct 18, 2022 at 1:01 PM James Houghton <jthoughton@xxxxxxxxxx> wrote:
> >
> > This change is very similar to the change that was made for shmem [1],
> > and it solves the same problem but for HugeTLBFS instead.
> >
> > Currently, when poison is found in a HugeTLB page, the page is removed
> > from the page cache. That means that attempting to map or read that
> > hugepage in the future will result in a new hugepage being allocated
> > instead of notifying the user that the page was poisoned. As [1] states,
> > this is effectively memory corruption.
> >
> > The fix is to leave the page in the page cache. If the user attempts to
> > use a poisoned HugeTLB page with a syscall, the syscall will fail with
> > EIO, the same error code that shmem uses. For attempts to map the page,
> > the thread will get a BUS_MCEERR_AR SIGBUS.
> >
> > [1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
> >
> > Signed-off-by: James Houghton <jthoughton@xxxxxxxxxx>
>
> Thanks for the patch. Yes, we should do the same thing for hugetlbfs.
> When I was working on shmem I did look into hugetlbfs too. But the
> problem is we actually make the whole hugetlb page unavailable even
> though just one 4K sub page is hwpoisoned. It may be fine to 2M
> hugetlb page, but a lot of memory may be a huge waste for 1G hugetlb
> page, particular for the page fault path.
One thing that complicated this a bit is the vmemmap optimizations for
hugetlb. However, I believe Naoya may have addressed this recently.
> So I discussed this with Mike offline last year, and I was told Google
> was working on PTE mapped hugetlb page. That should be able to solve
> the problem. And we'd like to have the high-granularity hugetlb
> mapping support as the predecessor.
Yes, I went back in my notes and noticed it had been one year. No offense
intended to James and his great work on HGM. However, in hindsight we should
have fixed this in some way without waiting for a HGM based.
> There were some other details, but I can't remember all of them, I
> have to refresh my memory by rereading the email discussions...
I think the complicating factor was vmemmap optimization. As mentioned
above, this may have already been addressed by Naoya in patches to
indicate which sub-page(s) had the actual error.
As Yang Shi notes, this patch makes the entire hugetlb page inaccessible.
With some work, we could allow reads to everything but the sub-page with
error. However, this should be much easier with HGM. And, we could
potentially even do page faults everywhere but the sub-page with error.
I still think it may be better to wait for HGM instead of trying to do
read access to all but sub-page with error now. But, entirely open to
other opinions.
I plan to do a review of this patch a little later.
--
Mike Kravetz