Re: [RFC PATCH v1 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug

From: HORIGUCHI NAOYA(堀口 直也)
Date: Mon May 09 2022 - 03:33:57 EST


On Thu, Apr 28, 2022 at 10:44:15AM +0200, David Hildenbrand wrote:
> >> 2) It happens rarely (ever?), so do we even care?
> >
> > I'm not certain of the rarity. Some cloud service providers who maintain
> > lots of servers may care?
>
> About replacing broken DIMMs? I'm not so sure, especially because it
> requires a special setup with ZONE_MOVABLE (i.e., movablecore) to be
> somewhat reliable and individual DIMMs can usually not get replaced at all.
>
> >
> >> 3) Once the memory is offline, we can re-online it and lost HWPoison.
> >> The memory can be happily used.
> >>
> >> 3) can happen easily if our DIMM consists of multiple memory blocks and
> >> offlining of some memory block fails -> we'll re-online all already
> >> offlined ones. We'll happily reuse previously HWPoisoned pages, which
> >> feels more dangerous to me then just leaving the DIMM around (and
> >> eventually hwpoisoning all pages on it such that it won't get used
> >> anymore?).
> >
> > I see. This scenario can often happen.
> >
> >>
> >> So maybe we should just fail offlining once we stumble over a hwpoisoned
> >> page?
> >
> > That could be one choice.
> >
> > Maybe another is like this: offlining can succeed but HWPoison flags are
> > kept over offline-reonline operations. If the system noticed that the
> > re-onlined blocks are backed by the original DIMMs or NUMA nodes, then the
> > saved HWPoison flags are still effective, so keep using them. If the
> > re-onlined blocks are backed by replaced DIMMs/NUMA nodes, then we can clear
> > all HWPoison flags associated with replaced physical address range. This
> > can be done automatically in re-onlining if there's a way for kernel to know
> > whether DIMM/NUMA nodes are replaced with new ones. But if there isn't,
> > system applications have to check the HW and explicitly reset the HWPoison
> > flags.
>
> Offline memory sections have a stale memmap, so there is no trusting on
> that. And trying to work around that or adjusting memory onlining code
> overcomplicates something we really don't care about supporting.

OK, so I'll go forward to reduce complexity in hwpoison specific code in
memory offlining code.

>
> So if we continue allowing offlining memory blocks with poisoned pages,
> we could simply remember that that memory block had any posioned page
> (either for the memory section or maybe better for the whole memory
> block). We can then simply reject/fail memory onlining of these memory
> blocks.

It seems to be helpful also for other conext (like hugetlb) to know whether
there's any hwpoisoned page in a given range of physical address, so I'll
think of this approach.

>
> So that leaves us with either
>
> 1) Fail offlining -> no need to care about reonlining
> 2) Succeed offlining but fail re-onlining

Rephrasing in case I misread, memory offlining code should never check
hwpoisoned pages finally, and memory onlining code would do kind of range
query to find hwpoisoned pages (without depending on PageHWPoison flag).

Thanks,
Naoya Horiguchi