Re: [RFC PATCH v1 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug

From: David Hildenbrand
Date: Thu May 12 2022 - 03:29:04 EST


>>>>
>>>> Once the problematic DIMM would actually get unplugged, the memory block devices
>>>> would get removed as well. So when hotplugging a new DIMM in the same
>>>> location, we could online that memory again.
>>>
>>> What about PG_hwpoison flags? struct pages are also freed and reallocated
>>> in the actual DIMM replacement?
>>
>> Once memory is offline, the memmap is stale and is no longer
>> trustworthy. It gets reinitialize during memory onlining -- so any
>> previous PG_hwpoison is overridden at least there. In some setups, we
>> even poison the whole memmap via page_init_poison() during memory offlining.
>>
>> Apart from that, we should be freeing the memmap in all relevant cases
>> when removing memory. I remember there are a couple of corner cases, but
>> we don't really have to care about that.
>
> OK, so there seems no need to manipulate struct pages for hwpoison in
> all relevant cases.

Right. When offlining a memory block, all we have to do is remember if
we stumbled over a hwpoisoned page and rememebr that inside the memory
block. Rejecting to online is then easy.

--
Thanks,

David / dhildenb