Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault

From: Peter Xu
Date: Wed Oct 06 2021 - 16:15:18 EST


On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> @@ -1148,8 +1148,12 @@ static int __get_hwpoison_page(struct page *page)
> return -EBUSY;
>
> if (get_page_unless_zero(head)) {
> - if (head == compound_head(page))
> + if (head == compound_head(page)) {
> + if (PageTransHuge(head))
> + SetPageHasHWPoisoned(head);
> +
> return 1;
> + }
>
> pr_info("Memory failure: %#lx cannot catch tail\n",
> page_to_pfn(page));

Sorry for the late comments.

I'm wondering whether it's ideal to set this bit here, as get_hwpoison_page()
sounds like a pure helper to get a refcount out of a sane hwpoisoned page. I'm
afraid there can be side effect that we set this without being noticed, so I'm
also wondering we should keep it in memory_failure().

Quotting comments for get_hwpoison_page():

* get_hwpoison_page() takes a page refcount of an error page to handle memory
* error on it, after checking that the error page is in a well-defined state
* (defined as a page-type we can successfully handle the memor error on it,
* such as LRU page and hugetlb page).

For example, I see that both unpoison_memory() and soft_offline_page() will
call it too, does it mean that we'll also set the bits e.g. even when we want
to inject an unpoison event too?

Thanks,

--
Peter Xu