Re: [PATCH v5 1/5] mm,memory_hotplug: Allocate memmap from the added memory range

From: Michal Hocko
Date: Fri Mar 26 2021 - 04:35:49 EST


On Thu 25-03-21 23:06:50, Oscar Salvador wrote:
> On Thu, Mar 25, 2021 at 05:47:30PM +0100, Michal Hocko wrote:
> > On Thu 25-03-21 17:36:22, Michal Hocko wrote:
> > > If all it takes is to make pfn_to_online_page work (and my
> > > previous attempt is incorrect because it should consult block rather
> > > than section pfn range)
> >
> > This should work.
>
> Sorry, but while this solves some of the issues with that approach, I really
> think that overcomplicates things and buys us not so much in return.
> To me it seems that we are just patching things to make it work that
> way.

I do agree that special casing vmemmap areas is something I do not
really like but we are in that schrödinger situation when this memory is
not onlineable unless it shares memory section with an onlineable
memory. There are two ways around that, either special case it on
pfn_to_online_page or mark the vmemmap section online even though it is
not really.

> To be honest, I dislike this, and I guess we can only agree to disagree
> here.

No problem there. I will not insist on my approach unless I can convince
you that it is a better solution. It seems I have failed and I can live
with that.

> I find the following much easier, cleaner, and less risky to encounter
> pitfalls in the future:
>
> (!!!It is untested and incomplete, and I would be surprised if it even
> compiles, but it is enough as a PoC !!!)
>
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 5ea2b3fbce02..799d14fc2f9b 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -169,6 +169,60 @@ int memory_notify(unsigned long val, void *v)
> return blocking_notifier_call_chain(&memory_chain, val, v);
> }
>
> +static int memory_block_online(unsigned long start_pfn, unsigned long nr_pages,
> + unsigned long nr_vmemmap_pages, int online_type,
> + int nid)
> +{
> + int ret;
> + /*
> + * Despite vmemmap pages having a different lifecycle than the pages
> + * they describe, initialiating and accounting vmemmap pages at the
> + * online/offline stage eases things a lot.

This requires quite some explaining.

> + * We do it out of {online,offline}_pages, so those routines only have
> + * to deal with pages that are actual usable memory.
> + */
> + if (nr_vmemmap_pages) {
> + struct zone *z;
> +
> + z = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
> + move_pfn_range_to_zone(z, start_pfn, nr_vmemmap_pages, NULL,
> + MIGRATE_UNMOVABLE);
> + /*
> + * The below could go to a helper to make it less bulky here,
> + * so {online,offline}_pages could also use it.
> + */
> + z->present_pages += nr_pages;
> + pgdat_resize_lock(z->zone_pgdat, &flags);
> + z->zone_pgdat->node_present_pages += nr_pages;
> + pgdat_resize_unlock(z->zone_pgdat, &flags);
> + }
> +
> + ret = online_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages,
> + online_type);
> +
> + /*
> + * In case online_pages() failed for some reason, we should cleanup vmemmap
> + * accounting as well.
> + */
> + return ret;
> +}

Yes this is much better! Just a minor suggestion would be to push
memory_block all the way to memory_block_online (it oline a memory
block). I would also slightly prefer to provide 2 helpers that would make
it clear that this is to reserve/cleanup the vmemamp space (defined in
the memory_hotplug proper).

Thanks!
--
Michal Hocko
SUSE Labs