Re: [RFC PATCH 2/4] mm/hotplug: Expose is_mem_section_removable() and offline_pages()

From: David Hildenbrand
Date: Wed Dec 11 2019 - 07:08:01 EST


On 10.12.19 16:46, lantianyu1986@xxxxxxxxx wrote:
> From: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx>
>
> Hyper-V driver adds memory hot remove function and will use
> these interfaces in Hyper-V balloon driver which may be built
> as a module. Expose these function.

This patches misses a detailed description how these interfaces will be
used. Also, you should CC people on the actual magic where it will be used.

I found it via https://lkml.org/lkml/2019/12/10/767

If I am not wrong (un)lock_device_hotplug() is not exposed to kernel
modules for a good reason - your patch seems to ignore that if I am not
wrong.

>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx>
> ---
> mm/memory_hotplug.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 07e5c67f48a8..4b358ebcc3d7 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1191,6 +1191,7 @@ bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
> /* All pageblocks in the memory block are likely to be hot-removable */
> return true;
> }
> +EXPORT_SYMBOL_GPL(is_mem_section_removable);
>
> /*
> * Confirm all pages in a range [start, end) belong to the same zone.
> @@ -1612,6 +1613,7 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
> {
> return __offline_pages(start_pfn, start_pfn + nr_pages);
> }
> +EXPORT_SYMBOL_GPL(offline_pages);
>
> static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
> {
>

No, I don't think exposing the latter is desired. We already have one
other in-tree user that I _really_ want to get rid of. Memory should be
offlined in memory block granularity via the core only. Memory offlining
can be triggered in a clean way via device_offline(&mem->dev).

a) It conflicts with activity from user space. Especially, this "manual
fixup" of the memory block state is just nasty.
b) Locking issues: Memory offlining requires the device hotplug lock.
This lock is not exposed and we don't want to expose it.
c) There are still cases where offline_pages() will loop for all
eternity and only signals can kick it out.

E.g., have a look at how I with virtio-mem want to achieve that:
https://lkml.org/lkml/2019/9/19/476

I think something like that would be *much* cleaner. What could be even
better for your use case is doing it similarly to virtio-mem:

1. Try to alloc_contig_range() the memory block you want to remove. This
will not loop forever but fail in a nice way early. See
https://lkml.org/lkml/2019/9/19/467

2. Allow to offline that memory block by marking the memory
PageOffline() and dropping the refcount. See
https://lkml.org/lkml/2019/9/19/470, I will send a new RFC v4 soon that
includes the suggestion from Michal.

3. Offline+remove the memory block using a clean interface. See
https://lkml.org/lkml/2019/9/19/476

No looping forever, no races with user space, no messing with memory
block states.

NACK on exporting offline_pages(), but I am not a Maintainer, so ... :)

--
Thanks,

David / dhildenb