Re: [PATCH 0/4] mm,memory_hotplug: allocate memmap from hotadded memory

From: Oscar Salvador
Date: Fri Mar 29 2019 - 04:30:14 EST


On Thu, Mar 28, 2019 at 04:09:06PM +0100, David Hildenbrand wrote:
> On 28.03.19 14:43, Oscar Salvador wrote:
> > Hi,
> >
> > since last two RFCs were almost unnoticed (thanks David for the feedback),
> > I decided to re-work some parts to make it more simple and give it a more
> > testing, and drop the RFC, to see if it gets more attention.
> > I also added David's feedback, so now all users of add_memory/__add_memory/
> > add_memory_resource can specify whether they want to use this feature or not.
>
> Terrific, I will also definetly try to make use of that in the next
> virito-mem prototype (looks like I'll finally have time to look into it
> again).

Great, I would like to see how this works there :-).

> I guess one important thing to mention is that it is no longer possible
> to remove memory in a different granularity it was added. I slightly
> remember that ACPI code sometimes "reuses" parts of already added
> memory. We would have to validate that this can indeed not be an issue.
>
> drivers/acpi/acpi_memhotplug.c:
>
> result = __add_memory(node, info->start_addr, info->length);
> if (result && result != -EEXIST)
> continue;
>
> What would happen when removing this dimm (->remove_memory())

Yeah, I see the point.
Well, we are safe here because the vmemmap data is being allocated in
every call to __add_memory/add_memory/add_memory_resource.

E.g:

* Being memblock granularity 128M

# object_add memory-backend-ram,id=ram0,size=256M
# device_add pc-dimm,id=dimm0,memdev=ram0,node=1

I am not sure how ACPI gets to split the DIMM in memory resources
(aka mem_device->res_list), but it does not really matter.
For each mem_device->res_list item, we will make a call to __add_memory,
which will allocate the vmemmap data for __that__ item, we do not care
about the others.

And when removing the DIMM, acpi_memory_remove_memory will make a call to
__remove_memory() for each mem_device->res_list item, and that will take
care of free up the vmemmap data.

Now, with all my tests, ACPI always considered a DIMM a single memory resource,
but maybe under different circumstances it gets to split it in different mem
resources.
But it does not really matter, as vmemmap data is being created and isolated in
every call to __add_memory.

> Also have a look at
>
> arch/powerpc/platforms/powernv/memtrace.c
>
> I consider it evil code. It will simply try to offline+unplug *some*
> memory it finds in *some granularity*. Not sure if this might be
> problematic-

Heh, memtrace from powerpc ^^, I saw some oddities coming from there, but
with my code though because I did not get to test that in concret.
But I am interested to see if it can trigger something, so I will be testing
that the next days.

> Would there be any "safety net" for adding/removing memory in different
> granularities?

Uhm, I do not think we need it, or at least I cannot think of a case where this
could cause trouble with the current design.
Can you think of any?

Thanks David ;-)

--
Oscar Salvador
SUSE L3