Re: [PATCH 0/4] mm,memory_hotplug: allocate memmap from hotadded memory

From: Oscar Salvador
Date: Fri Mar 29 2019 - 05:20:28 EST


On Fri, Mar 29, 2019 at 09:56:37AM +0100, David Hildenbrand wrote:
> Oh okay, so actually the way I guessed it would be now.
>
> While this makes totally sense, I'll have to look how it is currently
> handled, meaning if there is a change. I somewhat remembering that
> delayed struct pages initialization would initialize vmmap per section,
> not per memory resource.

Uhm, the memmap array for each section is built early during boot.
We actually do not care about deferred struct pages initialization there.
What we do is:

- We go through all memblock regions marked as memory
- We mark the sections within those regions present
- We initialize those sections and build the corresponding memmap array

The thing is that sparse_init_nid() allocates/reserves a buffer big enough
to allocate the memmap array for all those sections, and for each memmap
array to need to allocate, we consume it from that buffer, using contigous
memory.

Have a look at:

- sparse_memory_present_with_active_regions()
- sparse_init()
- sparse_init_nid
- sparse_buffer_init

> But as I work on 10 things differently, my mind sometimes seems to
> forget stuff in order to replace it with random nonsense. Will look into
> the details to not have to ask too many dumb questions.
>
> >
> > So, the taken approach is to allocate the vmemmap data corresponging to the
> > whole DIMM/memory-device/memory-resource from the beginning of its memory.
> >
> > In the example from above, the vmemmap data for both sections is allocated from
> > the beginning of the first section:
> >
> > memmap array takes 2MB per section, so 512 pfns.
> > If we add 2 sections:
> >
> > [ pfn#0 ] \
> > [ ... ] | vmemmap used for memmap array
> > [pfn#1023 ] /
> >
> > [pfn#1024 ] \
> > [ ... ] | used as normal memory
> > [pfn#65536] /
> >
> > So, out of 256M, we get 252M to use as a real memory, as 4M will be used for
> > building the memmap array.
> >
> > Actually, it can happen that depending on how big a DIMM/memory-device is,
> > the first/s memblock is fully used for the memmap array (of course, this
> > can only be seen when adding a huge DIMM/memory-device).
> >
>
> Just stating here, that with your code, add_memory() and remove_memory()
> always have to be called in the same granularity. Will have to see if
> that implies a change.

Well, I only tested it in such scenario yes, but I think that ACPI code
enforces that somehow.
I will take a closer look though.

--
Oscar Salvador
SUSE L3