Re: [PATCH] mm: hugetlbfs: add hwcrp_hugepages to record memory failure on hugetlbfs

From: Mike Kravetz
Date: Mon Jun 07 2021 - 15:13:19 EST


Resend with new e-mail for Naoya

On 6/7/21 7:16 AM, wangbin wrote:
> From: Bin Wang <wangbin224@xxxxxxxxxx>
>
> In the current hugetlbfs memory failure handler, reserved huge page
> counts are used to record the number of huge pages with hwposion.

I do not believe this is an accurate statement. Naoya is the memory
error expert and may disagree, but I do not see anywhere where reserve
counts are being used to track huge pages with memory errors.

IIUC, the routine hugetlbfs_error_remove_page is called after
unmapping the page from all user mappings. The routine will simply,
remove the page from the cache. This effectively removes the page
from the file as hugetlbfs is a memory only filesystem. The subsequent
call to hugetlb_unreserve_pages cleans up any reserve map entries
associated with the page and adjusts the reserve count if necessary.
The reserve count adjustment is based on removing the page from the
file, rather than the memory error. The same adjustment would be made
if the page was hole punched from the file.

What specific problem are you trying to solve? Are trying to see how
many huge pages were hit by memory errors?
--
Mike Kravetz