Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup
From: Muchun Song
Date: Thu Apr 30 2026 - 04:01:17 EST
> On Apr 29, 2026, at 18:44, David Hildenbrand (Arm) <david@xxxxxxxxxx> wrote:
>
> On 4/29/26 12:11, Usama Arif wrote:
>> On Wed, 29 Apr 2026 12:18:08 +0800 Muchun Song <muchun.song@xxxxxxxxx> wrote:
>>
>>>
>>>
>>>>
>>>>
>>>> lock_device_hotplug is a mutex lock, and we already take other mutex locks while
>>>> holding lock_folio in other paths, so I am not sure I see what should be special
>>>> in this case.
>>>
>>> Hi Oscar and Miaohe,
>>>
>>> I saw sashiko's report [1] related to folio lock and lock_device_hotplug.
>>> Seems it is possible. You can correct me if I am wrong.
>>>
>>> [1] https://sashiko.dev/#/patchset/20260428085219.1316047-1-songmuchun%40bytedance.com
>>>
>>> We could fix this by calling action_result() without holding folio lock.
>>> What do you think?
>>>
>>
>> Hello Muchun,
>>
>> You could end up in memblk_nr_poison_sub() while holding hugetlb_lock spin lock
>> from get_huge_page_for_hwpoison(), right?
>>
>> Lockdep would flag this as sleeping while atomic when acquiring mutex I think.
>
> Another thought would be, that we always call the inc/sub from memory failure
> code while we hold a folio reference and the page is not poisoned yet.
>
> That way, memory offlining cannot continue and the memory block cannot go away.
>
> So we'd let out page reference keep the memory block alive.
It seems unnecessary to hold lock_device_hotplug if the user already holds a
refcount on the page. I'd like to drop this patch.
Thanks.
>
> --
> Cheers,
>
> David