Re: [PATCH] mm: hwpoison: deal with page cache THP

From: Yang Shi
Date: Thu Sep 09 2021 - 19:07:59 EST


On Tue, Sep 7, 2021 at 9:25 PM HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@xxxxxxx> wrote:
>
> On Tue, Sep 07, 2021 at 08:14:26PM -0700, Yang Shi wrote:
> ...
> > > > > > > > 5. We also could define a new FGP flag to return poisoned page, NULL
> > > > > > > > or error pointer. This also should need significant code change since
> > > > > > > > a lt callsites need to be contemplated.
> > > > > > >
> > > > > > > Could you explain a little more about which callers should use the flag?
> > > > > >
> > > > > > Just to solve the above invalidate/truncate problem and page fault
> > > > > > doesn't expect an error pointer. But it seems the above
> > > > > > invalidate/truncate paths don't matter. Page fault should be the only
> > > > > > user since page fault may need unlock the page if poisoned page is
> > > > > > returned.
> > >
> > > Orignally PG_hwpoison is detected in page fault handler for file-backed pages
> > > like below,
> > >
> > > static vm_fault_t __do_fault(struct vm_fault *vmf)
> > > ...
> > > ret = vma->vm_ops->fault(vmf);
> > > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
> > > VM_FAULT_DONE_COW)))
> > > return ret;
> > >
> > > if (unlikely(PageHWPoison(vmf->page))) {
> > > if (ret & VM_FAULT_LOCKED)
> > > unlock_page(vmf->page);
> > > put_page(vmf->page);
> > > vmf->page = NULL;
> > > return VM_FAULT_HWPOISON;
> > > }
> > >
> > > so it seems to me that if a filesystem switches to the new scheme of keeping
> > > error pages in page cache, ->fault() callback for the filesystem needs to be
> > > aware of hwpoisoned pages inside it and returns VM_FAULT_HWPOISON when it
> > > detects an error page (maybe by detecting pagecache_get_page() to return
> > > PTR_ERR(-EIO or -EHWPOISON) or some other ways). In the other filesystems
> > > with the old scheme, fault() callback does not return VM_FAULT_HWPOISON so
> > > that page fault handler falls back to the generic PageHWPoison check above.
> >
> > Yes, exactly. If we return ERR_PTR we need modify the above code
> > accordingly. But returning the page, no change is required.
> >
> > >
> > > > >
> > > > > It seems page fault check IS_ERR(page) then just return
> > > > > VM_FAULT_HWPOISON. But I found a couple of places in shmem which want
> > > > > to return head page then handle subpage or just return the page but
> > > > > don't care the content of the page. They should ignore hwpoison. So I
> > > > > guess we'd better to have a FGP flag for such cases.
> > >
> > > At least currently thp are supposed to be split before error handling.
> >
> > Not for file THP.
> >
> > > We could loosen this assumption to support hwpoison on a subpage of thp,
> > > but I'm not sure whether that has some benefit.
> >
> > I don't quite get your point. Currently the hwpoison flag is set on
> > specific subpage instead of head page.
>
> Sorry, I miss the case when thp split fails, then the thp has hwpoison
> flag set on any subpage. I only thought of the successful case, where the
> resulting error page is no longer a subpage.
>
> >
> > > And also this discussion makes me aware that we need to have a way to
> > > prevent hwpoisoned pages from being used to thp collapse operation.
> >
> > Actually not. The refcount from hwpoison could prevent it from
> > collapsing. But if we return ERR_PTR (#3) we need extra code in
> > khugepaged to handle it.
>
> OK, so we already prevent it.

I just realized I'm wrong for the refcount. The poisoned page has been
isolated from LRU before khugepaged finds it. So the refcount from
isolation is not incremented at all. The refcount seen by khugepaged
is 3: hwpoison, page cache and khugepaged. It actually meets the
expectation.

The khugepaged does check if the page is a LRU page or not but without
holding page lock, so it may race with hwpoison. After holding page
lock, it doesn't check LRU flag anymore. So I think we need add the
check.

>
> Thanks,
> Naoya Horiguchi
>
> >
> > So I realized #1 would require fewer changes. And leaving the page
> > state check to callers seem reasonable since the callers usually check
> > other states, e.g. uptodate.