Re: [PATCH] mm: hugetlbfs: add hwcrp_hugepages to record memory failure on hugetlbfs

From: HORIGUCHI NAOYA(堀口 直也)
Date: Tue Jun 08 2021 - 04:01:44 EST


Thanks for forwarding the message, Mike.

On Mon, Jun 07, 2021 at 12:13:03PM -0700, Mike Kravetz wrote:
> Resend with new e-mail for Naoya
>
> On 6/7/21 7:16 AM, wangbin wrote:
> > From: Bin Wang <wangbin224@xxxxxxxxxx>
> >
> > In the current hugetlbfs memory failure handler, reserved huge page
> > counts are used to record the number of huge pages with hwposion.
>
> I do not believe this is an accurate statement. Naoya is the memory
> error expert and may disagree, but I do not see anywhere where reserve
> counts are being used to track huge pages with memory errors.

And Mike is right, hugetlb's reservation count is not linked
to accounting of hwpoisoned pages.

>
> IIUC, the routine hugetlbfs_error_remove_page is called after
> unmapping the page from all user mappings. The routine will simply,
> remove the page from the cache. This effectively removes the page
> from the file as hugetlbfs is a memory only filesystem. The subsequent
> call to hugetlb_unreserve_pages cleans up any reserve map entries
> associated with the page and adjusts the reserve count if necessary.
> The reserve count adjustment is based on removing the page from the
> file, rather than the memory error. The same adjustment would be made
> if the page was hole punched from the file.

This logic totally makes sense to me.

Unmapping done in memory_failure() might increment the reserve count,
but that's the cancel of the consumed reservation by unmapping.

Thanks,
Naoya Horigcuhi