Re: [PATCH] mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled

From: Miaohe Lin
Date: Wed Apr 10 2024 - 22:26:55 EST


On 2024/4/10 16:52, Oscar Salvador wrote:
> On Wed, Apr 10, 2024 at 03:52:14PM +0800, Miaohe Lin wrote:
>> AFAICS, iff check_pages_enabled static key is enabled and in hard offline mode,
>> check_new_pages() will prevent those pages from ending up in a PCP queue again
>> when refilling PCP list. Because PageHWPoison pages will be taken as 'bad' pages
>> and skipped when refill PCP list.
>
> Yes, but check_pages_enabled static key is only enabled when
> either CONFIG_DEBUG_PAGEALLOC or CONFIG_DEBUG_VM are set, which means
> that under most of the systems that protection will not take place.
>
> Which takes me to a problem we had in the past where we were handing
> over hwpoisoned pages from PCP lists happily.
> Now, with for soft-offline mode, we worked hard to stop doing that
> because soft-offline is a non-disruptive operation and no one should get
> killed.
> hard-offline is another story, but still I think that extending the
> comment to include the following would be a good idea:
>
> "Disabling pcp before dissolving the page was a deterministic approach
> because we made sure that those pages cannot end up in any PCP list.
> Draining PCP lists expels those pages to the buddy system, but nothing
> guarantees that those pages do not get back to a PCP queue if we need
> to refill those."

This really helps. Will add it in v2.
Thanks Oscar.

>
> Just to remind ourselves of the dangers of a non-deterministic
> approach.
>
>
> Thanks
>
>