Re: [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths
From: Lance Yang
Date: Mon Nov 10 2025 - 22:20:45 EST
+Mike
On 2025/11/11 07:07, Hillf Danton wrote:
On Tue, 11 Nov 2025 00:39:29 +0800 Lance Yang wrote:
On 2025/11/10 20:17, Harry Yoo wrote:
On Mon, Nov 10, 2025 at 07:15:53PM +0800, Lance Yang wrote:
From: Lance Yang <lance.yang@xxxxxxxxx>
The hugetlb VMA unmap path contains several potential deadlocks, as
reported by syzbot. These deadlocks occur in __hugetlb_zap_begin(),
move_hugetlb_page_tables(), and the retry path of
hugetlb_unmap_file_folio() (affecting remove_inode_hugepages() and
unmap_vmas()), where vma_lock is acquired before i_mmap_lock. This lock
ordering conflicts with other paths like hugetlb_fault(), which establish
the correct dependency as i_mmap_lock -> vma_lock.
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&vma_lock->rw_sema);
lock(&i_mmap_lock);
lock(&vma_lock->rw_sema);
lock(&i_mmap_lock);
Resolve the circular dependencies reported by syzbot across multiple call
chains by reordering the locks in all conflicting paths to consistently
follow the established i_mmap_lock -> vma_lock order.
But mm/rmap.c says:
* hugetlbfs PageHuge() take locks in this order:
* hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
* vma_lock (hugetlb specific lock for pmd_sharing)
* mapping->i_mmap_rwsem (also used for hugetlb pmd sharing)
* folio_lock
*/
Thanks! You are right, I was mistaken ...
I think the commit message should explain why the locking order described
above is incorrect (or when it became incorrect) and fix the comment?
I think the locking order documented in mm/rmap.c (vma_lock -> i_mmap_lock)
is indeed the correct one to follow.
Looking at the commit[1] that introduced the vma_lock, it seems possible
that
the deadlock reported by syzbot[2] is a false positive ...
From the commit message:
```
The vma_lock is used as follows:
- During fault processing. The lock is acquired in read mode before
doing a page table lock and allocation (huge_pte_alloc). The lock is
held until code is finished with the page table entry (ptep).
- The lock must be held in write mode whenever huge_pmd_unshare is
called.
Lock ordering issues come into play when unmapping a page from all
vmas mapping the page. The i_mmap_rwsem must be held to search for the
vmas, and the vma lock must be held before calling unmap which will
call huge_pmd_unshare. This is done today in:
- try_to_migrate_one and try_to_unmap_ for page migration and memory
error handling. In these routines we 'try' to obtain the vma lock and
fail to unmap if unsuccessful. Calling routines already deal with the
failure of unmapping.
- hugetlb_vmdelete_list for truncation and hole punch. This routine
also tries to acquire the vma lock. If it fails, it skips the
unmapping. However, we can not have file truncation or hole punch
fail because of contention. After hugetlb_vmdelete_list, truncation
and hole punch call remove_inode_hugepages. remove_inode_hugepages
checks for mapped pages and call hugetlb_unmap_file_page to unmap them.
hugetlb_unmap_file_page is designed to drop locks and reacquire in the
correct order to guarantee unmap success.```
The locking logic is a bit tricky; some paths can't follow a strict lock
order
and must use trylock or a drop/retry pattern to avoid deadlocking :)
Hoping Mike can take a look and confirm!
[1]
https://lore.kernel.org/all/20220914221810.95771-9-mike.kravetz@xxxxxxxxxx/
[2]
https://lore.kernel.org/linux-mm/69113a97.a70a0220.22f260.00ca.GAE@xxxxxxxxxx/
Thanks,
Lance
This fix has it backwards then. I'll rework it to fix the actual violations.
Break a leg, better after taking a look at ffa1e7ada456 ("block: Make
request_queue lockdep splats show up earlier")