Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup

From: Miaohe Lin

Date: Tue Apr 28 2026 - 23:09:39 EST


On 2026/4/28 21:52, Muchun Song wrote:
>
>
>
>> On Apr 28, 2026, at 20:34, Miaohe Lin <linmiaohe@xxxxxxxxxx> wrote:
>> On 2026/4/28 19:40, Muchun Song wrote:
>>>
>>>
>>>> On Apr 28, 2026, at 19:37, Miaohe Lin <linmiaohe@xxxxxxxxxx> wrote:
>>>> On 2026/4/28 16:52, Muchun Song wrote:
>>>>> memblk_nr_poison_inc() and memblk_nr_poison_sub() call
>>>>> find_memory_block_by_id(), which requires device_hotplug_lock to
>>>>> serialize the xarray lookup against memory block removal.
>>>>> Take device_hotplug_lock around the lookup and nr_hwpoison update so
>>>>> the memory block cannot disappear between xa_load() and get_device().
>>>>> Fixes: 5033091de814 ("mm/hwpoison: introduce per-memory_block hwpoison counter")
>>>>> Cc: stable@xxxxxxxxxxxxxxx
>>>>> Signed-off-by: Muchun Song <songmuchun@xxxxxxxxxxxxx>
>>>> Thanks for update.
>>>>> ---
>>>>> drivers/base/memory.c | 10 ++++++++--
>>>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>>>>> index 6981b55d582a..f76aee29e9a5 100644
>>>>> --- a/drivers/base/memory.c
>>>>> +++ b/drivers/base/memory.c
>>>>> @@ -1228,23 +1228,29 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
>>>>> void memblk_nr_poison_inc(unsigned long pfn)
>>>>> {
>>>>> const unsigned long block_id = pfn_to_block_id(pfn);
>>>>> - struct memory_block *mem = find_memory_block_by_id(block_id);
>>>>> + struct memory_block *mem;
>>>>> + lock_device_hotplug();
>>>> memblk_nr_poison_inc() and memblk_nr_poison_sub() are both called from memory_failure() context.
>>>> I'm afraid if memory_failure() is triggered while lock_device_hotplug is held, it will lead to
>>>> deadlock. Or am I miss something?
>>>
>>> I am curious is there any place where memory_failure() is called with holding lock_device_hotplug?
>>
>> Sorry for dumb scenario, I was a bit too presumptuous. But there might be another possible deadlock:
>>
>> remove_memory
>> lock_device_hotplug <-- first called here
>> try_remove_memory
>> remove_memory_block_devices
>> num_poisoned_pages_sub
>
> Passing pfn = -1 here.
>
>> memblk_nr_poison_sub
>> lock_device_hotplug <-- deadlock here
>
> No. Can’t reach here. No deadlock.

Right, I missed that. Thanks. But I'm still worried that there might be potential issues.
For example, this function could be called while lock_page is held. Acquiring lock_device_hotplug
while already holding lock_page might cause problems, though I haven't seen any specific issues yet.
Also there might be some other potential scenarios that haven't been considered. Hope I'm just
overthinking it. :)

Reviewed-by: Miaohe Lin <linmiaohe@xxxxxxxxxx>

Thanks.
.