Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat

From: Dan Williams
Date: Fri Jan 10 2020 - 16:27:37 EST


On Fri, Jan 10, 2020 at 9:42 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 10.01.20 18:39, Dan Williams wrote:
> > On Fri, Jan 10, 2020 at 9:36 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >> On 10.01.20 18:33, Dan Williams wrote:
> >>> On Fri, Jan 10, 2020 at 9:29 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>> [..]
> >>>>> So then the comment is actively misleading for that case. I would
> >>>>> expect an explicit _unlocked path for that case with a comment about
> >>>>> why it's special. Is there already a comment to that effect somewhere?
> >>>>>
> >>>>
> >>>> __add_memory() - the locked variant - is called from the same ACPI location
> >>>> either locked or unlocked. I added a comment back then after a longe
> >>>> discussion with Michal:
> >>>>
> >>>> drivers/acpi/scan.c:
> >>>> /*
> >>>> * Although we call __add_memory() that is documented to require the
> >>>> * device_hotplug_lock, it is not necessary here because this is an
> >>>> * early code when userspace or any other code path cannot trigger
> >>>> * hotplug/hotunplug operations.
> >>>> */
> >>>>
> >>>>
> >>>> It really is a special case, though.
> >>>
> >>> That's a large comment block when we could have just taken the lock.
> >>> There's probably many other code paths in the kernel where some locks
> >>> are not necessary before userspace is up, but the code takes the lock
> >>> anyway to minimize the code maintenance burden. Is there really a
> >>> compelling reason to be clever here?
> >>
> >> It was a lengthy discussion back then and I was sharing your opinion. I
> >> even had a patch ready to enforce that we are holding the lock (that's
> >> how I identified that specific case in the first place).
> >
> > Ok, apologies I missed that opportunity to back you up. Michal, is
> > this still worth it?
> >
>
> For your reference (roughly 5 months ago, so not that old)
>
> https://lkml.kernel.org/r/20190724143017.12841-1-david@xxxxxxxxxx

Oh, now I see the problem. You need to add that lock so far away from
the __add_memory() to avoid lock inversion problems with the
acpi_scan_lock. The organization I was envisioning would not work
without deeper refactoring.