Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
From: David Hildenbrand
Date: Fri Aug 17 2018 - 04:27:59 EST
On 17.08.2018 10:20, Rafael J. Wysocki wrote:
> On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>>
>> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
>> fix concurrent memory hot-add deadlock"), which tried to fix a possible
>> lock inversion reported and discussed in [1] due to the two locks
>> a) device_lock()
>> b) mem_hotplug_lock
>>
>> While add_memory() first takes b), followed by a) during
>> bus_probe_device(), onlining of memory from user space first took b),
>> followed by a), exposing a possible deadlock.
>>
>> In [1], and it was decided to not make use of device_hotplug_lock, but
>> rather to enforce a locking order. Looking at 1., this order is not always
>> satisfied when calling device_online() - essentially we simply don't take
>> one of both locks anymore - and fixing this would require us to
>> take the mem_hotplug_lock in core driver code (online_store()), which
>> sounds wrong.
>>
>> The problems I spotted related to this:
>>
>> 1. Memory block device attributes: While .state first calls
>> mem_hotplug_begin() and the calls device_online() - which takes
>> device_lock() - .online does no longer call mem_hotplug_begin(), so
>> effectively calls online_pages() without mem_hotplug_lock. onlining/
>> offlining of pages is no longer serialised across different devices.
>>
>> 2. device_online() should be called under device_hotplug_lock, however
>> onlining memory during add_memory() does not take care of that. (I
>> didn't follow how strictly this is needed, but there seems to be a
>> reason because it is documented at device_online() and
>> device_offline()).
>>
>> In addition, I think there is also something wrong about the locking in
>>
>> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>> (and device_online()) without locks. This was introduced after
>> 30467e0b3be. And skimming over the code, I assume it could need some
>> more care in regards to locking.
>>
>> ACPI code already holds the device_hotplug_lock, and as we are
>> effectively hotplugging memory block devices, requiring to hold that
>> lock does not sound too wrong, although not chosen in [1], as
>> "I don't think resolving a locking dependency is appropriate by
>> just serializing them with another lock."
>> I think this is the cleanest solution.
>>
>> Requiring add_memory()/add_memory_resource() to be called under
>> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
>> online_pages/offline_pages() fixes 1. and 3.
>>
>> Fixup all callers of add_memory/add_memory_resource to hold the lock if
>> not already done.
>>
>> So this is essentially a revert of 30467e0b3be, implementation of what
>> was suggested in [1] by Vitaly, applied to the current tree.
>>
>> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
>> 2015-February/065324.html
>>
>> This patch is partly based on a patch by Vitaly Kuznetsov.
>>
>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
>> ---
>> arch/powerpc/platforms/powernv/memtrace.c | 3 ++
>> drivers/acpi/acpi_memhotplug.c | 1 +
>> drivers/base/memory.c | 18 +++++-----
>> drivers/hv/hv_balloon.c | 4 +++
>> drivers/s390/char/sclp_cmd.c | 3 ++
>> drivers/xen/balloon.c | 3 ++
>> mm/memory_hotplug.c | 42 ++++++++++++++++++-----
>> 7 files changed, 55 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
>> index 51dc398ae3f7..4c2737a33020 100644
>> --- a/arch/powerpc/platforms/powernv/memtrace.c
>> +++ b/arch/powerpc/platforms/powernv/memtrace.c
>> @@ -206,6 +206,8 @@ static int memtrace_online(void)
>> int i, ret = 0;
>> struct memtrace_entry *ent;
>>
>> + /* add_memory() requires device_hotplug_lock */
>> + lock_device_hotplug();
>> for (i = memtrace_array_nr - 1; i >= 0; i--) {
>> ent = &memtrace_array[i];
>>
>> @@ -244,6 +246,7 @@ static int memtrace_online(void)
>> pr_info("Added trace memory back to node %d\n", ent->nid);
>> ent->size = ent->start = ent->nid = -1;
>> }
>> + unlock_device_hotplug();
>> if (ret)
>> return ret;
>>
>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
>> index 6b0d3ef7309c..e7a4c7900967 100644
>> --- a/drivers/acpi/acpi_memhotplug.c
>> +++ b/drivers/acpi/acpi_memhotplug.c
>> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>> if (node < 0)
>> node = memory_add_physaddr_to_nid(info->start_addr);
>>
>> + /* we already hold the device_hotplug lock at this point */
>> result = add_memory(node, info->start_addr, info->length);
>>
>> /*
>
> A very minor nit here: I would say "device_hotplug_lock is already
> held at this point" in the comment (I sort of don't like to say "we"
> in code comments as it is not particularly clear what group of people
> is represented by that and the lock is actually called
> device_hotplug_lock).
Easy to fix, thanks!
>
> Otherwise the approach is fine by me.
>
> BTW, the reason why device_hotplug_lock is acquired by the ACPI memory
> hotplug is because it generally needs to be synchronized with respect
> CPU hot-remove and similar. I believe that this may be the case in
> non-ACPI setups as well.
Yes, and that lock is the reason why we didn't have real problems with
ACPI memory hotplug in this respect so far. (as user triggered
online/offline also takes that lock already)
>
> Thanks,
> Rafael
>
--
Thanks,
David / dhildenb