Re: [PATCH v3 0/3] mm/hwpoison: fix unpoison_memory()

From: Naoya Horiguchi
Date: Fri Nov 05 2021 - 07:50:11 EST


On Fri, Nov 05, 2021 at 11:58:15AM +0100, David Hildenbrand wrote:
> On 05.11.21 06:50, Naoya Horiguchi wrote:
> > Hi,
> >
> > I updated the unpoison patchset based ou discussions over v2.
> > Please see individual patches for details of updates.
> >
> > ----- (cover letter copied from v2) -----
> > Main purpose of this series is to sync unpoison code to recent changes
> > around how hwpoison code takes page refcount. Unpoison should work or
> > simply fail (without crash) if impossible.
> >
> > The recent works of keeping hwpoison pages in shmem pagecache introduce
> > a new state of hwpoisoned pages, but unpoison for such pages is not
> > supported yet with this series.
> >
> > It seems that soft-offline and unpoison can be used as general purpose
> > page offline/online mechanism (not in the context of memory error).
>
> I'm not sure what the target use case would be TBH ... for proper memory
> offlining/memory hotunplug we have to offline whole memory blocks. For
> memory ballooning based mechanisms we simply allocate random free pages
> and eventually trigger reclaim to make more random free pages available.
> For memory hotunplug via virtio-mem we're using alloc_contig_range() to
> allocate ranges of interest we logically unplug.

I heard about it from two people independently and I think that that's maybe
a rough idea, so if no one shows the clear use case or someone logically
shows that we don't need it, I do not head for it.

>
> The only benefit compared to alloc_contig_range() might be that we can
> offline smaller chunks -- alloc_contig_range() isn't optimized for
> sub-MAX_ORDER granularity yet. But then, alloc_contig_range() should
> much rather be extended.

If alloc_contig_range() supports memory offline in arbitrary size of
granurality (including a single page), maybe soft offline can be (partially
I guess) unified to it.

Thanks,
Naoya Horiguchi

>
> Long story short, I'm not sure there is a sane use case for this
> "general purpose page offline/online mechanism" ...