Re: [PATCH 0/3] mm: use memmap_on_memory semantics for dax/kmem

From: Verma, Vishal L
Date: Wed Jun 21 2023 - 15:32:25 EST


On Fri, 2023-06-16 at 09:44 +0200, David Hildenbrand wrote:
> On 16.06.23 00:00, Vishal Verma wrote:
> > The dax/kmem driver can potentially hot-add large amounts of memory
> > originating from CXL memory expanders, or NVDIMMs, or other 'device
> > memories'. There is a chance there isn't enough regular system memory
> > available to fit ythe memmap for this new memory. It's therefore
> > desirable, if all other conditions are met, for the kmem managed memory
> > to place its memmap on the newly added memory itself.
> >
> > Arrange for this by first allowing for a module parameter override for
> > the mhp_supports_memmap_on_memory() test using a flag, adjusting the
> > only other caller of this interface in dirvers/acpi/acpi_memoryhotplug.c,
> > exporting the symbol so it can be called by kmem.c, and finally changing
> > the kmem driver to add_memory() in chunks of memory_block_size_bytes().
>
> 1) Why is the override a requirement here? Just let the admin configure
> it then then add conditional support for kmem.

Configure it in the current way using the module parameter to
memory_hotplug? The whole module param check feels a bit awkward,
especially since memory_hotplug is builtin, the only way to supply the
param is on the kernel command line as opposed to a modprobe config.

The goal with extending mhp_supports_memmap_on_memory() to check for
support with or without consideration for the module param is that it
makes it a bit more flexible for callers like kmem.

> 2) I recall that there are cases where we don't want the memmap to land
> on slow memory (which online_movable would achieve). Just imagine the
> slow PMEM case. So this might need another configuration knob on the
> kmem side.
>
> I recall some discussions on doing that chunk handling internally (so
> kmem can just perform one add_memory() and we'd split that up internally).
>
Another config knob isn't unreasonable - though the thinking in making
this behavior the new default policy was that with CXL based memory
expanders, the performance delta from main memory wouldn't be as big as
the pmem - main memory delta. With pmem devices being phased out, it's
not clear we'd need a knob, and it can always be added if it ends up
becoming necessary.

The other comments about doing the per-memblock loop internally, and
fixing up the removal paths all sound good, and I've started reworking
those - thanks for taking a look!