Re: [PATCH v2] mm/hwpoison: use pr_err() instead of dump_page() in get_any_page()

From: Matthew Wilcox
Date: Mon May 09 2022 - 22:28:35 EST


On Wed, Apr 27, 2022 at 02:32:20PM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
>
> The following VM_BUG_ON_FOLIO() is triggered when memory error event
> happens on the (thp/folio) pages which are about to be freed:

So the real problem is that we're calling dump_page() when we don't
have a reference to the page, right? Otherwise it wouldn't be freed.

> out:
> if (ret == -EIO)
> - dump_page(p, "hwpoison: unhandlable page");
> + pr_err("Memory failure: %#lx: unhandlable page.\n", page_to_pfn(p));

It would be nice to get some more information out of the page than that
,.. but taking a refcount inside dump_page() conflicts with the other
"would be nice", which is for dump_page() to take a const struct page *
so we can (eg) make folio_test_uptodate() take a const struct folio *.

We've had some other problems with inconsistent pages being printed in
dump_page(). It can be quite confusing when debugging. I still don't
have a good solution to that either.

I do have a proposal for reforming mapcount which will solve this
particular problem, but I'm not quite sure when I'll get to it.
This patch is probably the best thing to do for now.