Re: [PATCH 4/8] mm/memory_hotplug: introduce MEM_PHYS_ONLINE/OFFLINE memory notifiers

From: David Hildenbrand
Date: Thu Nov 16 2023 - 14:02:45 EST


On 15.11.23 16:03, Gerald Schaefer wrote:
On Tue, 14 Nov 2023 19:27:35 +0100
David Hildenbrand <david@xxxxxxxxxx> wrote:

On 14.11.23 19:02, Sumanth Korikkar wrote:
Add new memory notifiers to mimic the dynamic ACPI event triggered logic
for memory hotplug on platforms that do not generate such events. This
will be used to implement "memmap on memory" feature for s390 in a later
patch.

Platforms such as x86 can support physical memory hotplug via ACPI. When
there is physical memory hotplug, ACPI event leads to the memory
addition with the following callchain:
acpi_memory_device_add()
-> acpi_memory_enable_device()
-> __add_memory()

After this, the hotplugged memory is physically accessible, and altmap
support prepared, before the "memmap on memory" initialization in
memory_block_online() is called.

On s390, memory hotplug works in a different way. The available hotplug
memory has to be defined upfront in the hypervisor, but it is made
physically accessible only when the user sets it online via sysfs,
currently in the MEM_GOING_ONLINE notifier. This requires calling
add_memory() during early memory detection, in order to get the sysfs
representation, but we cannot use "memmap on memory" altmap support at
this stage, w/o having it physically accessible.

Since no ACPI or similar events are generated, there is no way to set up
altmap support, or even make the memory physically accessible at all,
before the "memmap on memory" initialization in memory_block_online().

The new MEM_PHYS_ONLINE notifier allows to work around this, by
providing a hook to make the memory physically accessible, and also call
__add_pages() with altmap support, early in memory_block_online().
Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory
inaccessible and call __remove_pages(), at the end of
memory_block_offline().

Calling __add/remove_pages() requires mem_hotplug_lock, so move
mem_hotplug_begin/done() to include the new notifiers.

All architectures ignore unknown memory notifiers, so this patch should
not introduce any functional changes.

Sorry to say, no. No hacks please, and this is a hack for memory that
has already been added to the system.

IIUC, when we enter memory_block_online(), memory has always already
been added to the system, on all architectures. E.g. via ACPI events
on x86, or with the existing s390 hack, where we add it at boot time,
including memmap allocated from system memory. Without a preceding
add_memory() you cannot reach memory_block_online() via sysfs online.

Adding that memory block at boot time is the legacy leftover s390x is carrying along; and now we want to "workaround" that by adding s390x special handling for online/offlining code and having memory blocks without any memmap, or configuring an altmap in the very last minute using a s390x specific memory notifier.

Instead, if you want to support the altmap, the kernel should not add standby memory to the system (if configured for this new feature), but instead only remember the standby memory ranges so it knows what can later be added and what can't.

From there, users should have an interface where they can actually add memory to the system, and either online it manually or just let the kernel online it automatically.

s390x code will call add_memory() and properly prepare an altmap if requested and make that standby memory available. You can then even have an interface to remove that memory again once offline. That will work with an altmap or without an altmap.

This approach is aligned with any other code that hot(un)plugs memory and is compatible with things like variable-sized memory blocks people have been talking about quite a while already, and altmaps that span multiple memory blocks to make gigantic pages in such ranges usable.

Sure, you'll have a new interface and have to enable the new handling for the new kernel, but you're asking for supporting a new feature that cannot be supported cleanly just like any other architecture does. But it's a clean approach and probably should have been done that way right from the start (decades ago).

Note: We do have the same for other architectures without ACPI that add memory via the probe interface. But IIRC we cannot really do any checks there, because these architectures have no way of identifying what


The difference is that for s390, the memory is not yet physically
accessible, and therefore we cannot use the existing altmap support
in memory_block_online(), which requires that the memory is accessible
before it calls mhp_init_memmap_on_memory().

Currently, on s390 we make the memory accessible in the GOING_ONLINE
notifier, by sclp call to the hypervisor. That is too late for altmap
setup code in memory_block_online(), therefore we'd like to introduce
the new notifier, to have a hook where we can make it accessible
earlier, and after that there is no difference to how it works for
other architectures, and we can make use of the existing altmap support.


If you want memory without an altmap to suddenly not have an altmap
anymore, then look into removing and readding that memory, or some way
to convert offline memory.

We do not want to have memory suddenly not have an altmap support
any more, but simply get a hook so that we can prepare the memory
to have altmap support. This means making it physically accessible,
and calling __add_pages() for altmap support, which for other
architecture has already happened before.

Of course, it is a hack for s390, that we must skip __add_pages()
in the initial (arch_)add_memory() during boot time, when we want
altmap support, because the memory simply is not accessible at that
time. But s390 memory hotplug support has always been a hack, and
had to be, because of how it is implemented by the architecture.

I write above paragraph before reading this; and it's fully aligned with what I said above.


So we replace one hack with another one, that has the huge advantage
that we do not need to allocate struct pages upfront from system
memory any more, for the whole possible online memory range.

And the current approach comes without any change to existing
interfaces, and minimal change to common code, i.e. these new
notifiers, that should not have any impact on other architectures.

What exactly is your concern regarding the new notifiers? Is it
useless no-op notifier calls on other archs (not sure if they
would get optimized out by compiler)?

That it makes hotplug code more special because of s390x, instead of cleaning up that legacy code.

--
Cheers,

David / dhildenb