Re: [PATCH -next v3] mm/hotplug: silence a lockdep splat with printk()

From: Qian Cai
Date: Thu Jan 16 2020 - 11:27:34 EST




> On Jan 16, 2020, at 11:04 AM, David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 16.01.20 16:54, Michal Hocko wrote:
>> On Thu 16-01-20 09:53:13, Qian Cai wrote:
>>>
>>>
>>>> On Jan 16, 2020, at 9:28 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>>>>
>>>> On Wed 15-01-20 12:29:16, Qian Cai wrote:
>>>>> It is guaranteed to trigger a lockdep splat if calling printk() with
>>>>> zone->lock held because there are many places (tty, console drivers,
>>>>> debugobjects etc) would allocate some memory with another lock
>>>>> held which is proved to be difficult to fix them all.
>>>>
>>>> I am still not happy with the above much. What would say about something
>>>> like below instead?
>>>> "
>>>> It is not that hard to trigger lockdep splats by calling printk from
>>>> under zone->lock. Most of them are false positives caused by lock chains
>>>> introduced early in the boot process and they do not cause any real
>>>> problems. There are some console drivers which do allocate from the
>>>> printk context as well and those should be fixed. In any case false
>>>> positives are not that trivial to workaround and it is far from optimal
>>>> to lose lockdep functionality for something that is a non-issue.
>>>> <An example of such a false positive goes here>
>>>> "
>>>
>>> I feel like I repeated myself too many times. A call trace for one lock dependency
>>> is sometimes from early boot process because lockdep will save the first one it
>>> encountered, but it does not mean the lock dependency will only not happen in
>>> early boot. I spent some time to study those early boot call traces in the given
>>> lockdep splats, and it looks to me the lock dependency is also possible after
>>> the boot.
>>
>> Then state it explicitly with an example of the trace and explanation
>> that the deadlock is real. If the deadlock is real then it shouldn't be
>> really terribly hard to notice even without lockdep splats which get
>> disabled after the first false positive, right?
>
> I was asking myself for a long time: did anybody actually see this
> deadlock in real life?

Nobody knows for sure. I think one reason is that not many people will use
memory offiline even if they do, it will mostly not be a continuous activity in
the system. debugobjects make it way easier to reproduce because it allocates
memory in random places, but then it is not all that popular.