Re: [PATCH] mm/memory-hotplug: Add sysfs hot-remove trigger

From: Robin Murphy
Date: Thu Feb 07 2019 - 10:32:58 EST


On 07/02/2019 13:36, Oscar Salvador wrote:
On Wed, Feb 06, 2019 at 05:03:53PM +0000, Robin Murphy wrote:
ARCH_MEMORY_PROBE is a useful thing for testing and debugging hotplug,
but being able to exercise the (arguably trickier) hot-remove path would
be even more useful. Extend the feature to allow removal of offline
sections to be triggered manually to aid development.

Signed-off-by: Robin Murphy <robin.murphy@xxxxxxx>
---

This is inspired by a previous proposal[1], but in coming up with a
more robust interface I ended up rewriting the whole thing from
scratch. The lack of documentation is semi-deliberate, since I don't
like the idea of anyone actually relying on this interface as ABI, but
as a handy tool it felt useful enough to be worth sharing :)

Hi Robin,

I think this might come in handy, especially when trying to test hot-remove
on arch's that do not have any means to hot-remove memory, or even on virtual
platforms that do not have yet support for hot-remove depending on the platform,
like qemu/arm64.


I could have used this while testing hot-remove on other archs for [1]


Robin.

[1] https://lore.kernel.org/lkml/22d34fe30df0fbacbfceeb47e20cb1184af73585.1511433386.git.ar@xxxxxxxxxxxxxxxxxx/


+ if (mem->state != MEM_OFFLINE)
+ return -EBUSY;

We do have the helper "is_memblock_offlined()", although it is only used in one place now.
So, I would rather use it here as well.

Ooh, if I'd actually noticed that that helper existed, I would indeed have used it - fixed.

+
+ ret = lock_device_hotplug_sysfs();
+ if (ret)
+ return ret;
+
+ if (device_remove_file_self(dev, attr)) {
+ __remove_memory(pfn_to_nid(start_pfn), PFN_PHYS(start_pfn),
+ MIN_MEMORY_BLOCK_SIZE * sections_per_block);

Sorry, I am not into sysfs inners, but I thought that:
device_del::device_remove_attrs::device_remove_groups::sysfs_remove_groups
would be enough to remove the dev attributes.
I guess in this case that is not enough, could you explain why?

As I found out the hard way, since the "remove" attribute itself belongs to the device being removed, the standard device teardown callchain would end up trying to remove the file from its own method, which results in deadlock. Fortunately, the PCI sysfs code has a similar "remove" attribute which showed me how it should be handled - following the kerneldoc breadcrumb trail to kernfs_remove_self() hopefully explains it more completely.

Thanks,
Robin.



[1] https://patchwork.kernel.org/patch/10775339/