Re: [PATCH resend] mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison

From: Miaohe Lin

Date: Tue May 26 2026 - 23:36:06 EST


On 2026/5/23 11:50, Andrew Morton wrote:
> On Fri, 22 May 2026 09:03:05 +0800 Wupeng Ma <mawupeng1@xxxxxxxxxx> wrote:
>
>> Two concurrent madvise(MADV_HWPOISON) calls on the same hugetlb page
>> can trigger a recursive spinlock self-deadlock (AA deadlock) on
>> hugetlb_lock when racing with a concurrent unmap:
>
> Well we don't want that.
>
>> Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
>
> So I'll add cc:stable here.
>
> AI review didn't like the unlocked page_folio():
>
> https://sashiko.dev/#/patchset/20260522010305.4099834-1-mawupeng1@xxxxxxxxxx
>
> So I'll add a followup patch which addresses that (and which addresses
> Miaohe's naming nit).
>
> Please let's check this - perhaps the locking alteration isn't needed.
>
>
> From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Subject: mm-memory-failure-fix-hugetlb_lock-aa-deadlock-in-get_huge_page_for_hwpoison-fix
> Date: Fri May 22 08:44:25 PM PDT 2026
>
> - address possible race identified by Sashiko
>
> - s/out/out_unlock/, per Miaohe
>
> Link: https://sashiko.dev/#/patchset/20260522010305.4099834-1-mawupeng1@xxxxxxxxxx
> Link: https://lore.kernel.org/f39f405e-4b4b-8f79-70fe-a2b5b62114eb@xxxxxxxxxx
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx>
> Cc: Liam Howlett <liam.howlett@xxxxxxxxxx>
> Cc: Lorenzo Stoakes <ljs@xxxxxxxxxx>
> Cc: Miaohe Lin <linmiaohe@xxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxx>
> Cc: Mike Rapoport <rppt@xxxxxxxxxx>
> Cc: Muchun Song <muchun.song@xxxxxxxxx>
> Cc: Naoya Horiguchi <nao.horiguchi@xxxxxxxxx>
> Cc: Oscar Salvador (SUSE) <osalvador@xxxxxxxxxx>
> Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> Cc: Vlastimil Babka <vbabka@xxxxxxxxxx>
> Cc: Wupeng Ma <mawupeng1@xxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
>
> mm/memory-failure.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> --- a/mm/memory-failure.c~mm-memory-failure-fix-hugetlb_lock-aa-deadlock-in-get_huge_page_for_hwpoison-fix
> +++ a/mm/memory-failure.c
> @@ -1970,14 +1970,15 @@ static int get_huge_page_for_hwpoison(un
> bool *migratable_cleared)
> {
> struct page *page = pfn_to_page(pfn);
> - struct folio *folio = page_folio(page);
> + struct folio *folio;
> bool count_increased = false;
> int ret, rc;
>
> spin_lock_irq(&hugetlb_lock);
> + folio = page_folio(page);

This fix works to me. The folio could become stale without holding hugetlb_lock
due to race, e.g. hugetlb demotion, dissolve and ...

Thanks Andrew.
.