Re: [PATCH] mm: Fix mmap_assert_locked() in follow_pte()

From: Sergey Senozhatsky
Date: Fri Jul 12 2024 - 04:04:27 EST


On (24/07/11 23:33), David Hildenbrand wrote:
[..]
> > @@ -1815,9 +1815,16 @@ static void unmap_single_vma(struct mmu_gather *tlb,
> > if (vma->vm_file)
> > uprobe_munmap(vma, start, end);
> > - if (unlikely(vma->vm_flags & VM_PFNMAP))
> > + if (unlikely(vma->vm_flags & VM_PFNMAP)) {
> > + if (!mm_wr_locked)
> > + mmap_read_lock(vma->vm_mm);
> > +
> > untrack_pfn(vma, 0, 0, mm_wr_locked);
> > + if (!mm_wr_locked)
> > + mmap_read_unlock(vma->vm_mm);
> > + }
> > +
> > if (start != end) {
> > if (unlikely(is_vm_hugetlb_page(vma))) {
>
> I'm not sure if this is the right fix. I like to understand how we end up
> without the mmap lock at least in read mode in that path?

I suspect this is causing a deadlock:

[ 10.263161] ============================================
[ 10.263165] WARNING: possible recursive locking detected
[ 10.263170] 6.10.0-rc7-next-20240712+ #645 Tainted: G N
[ 10.263177] --------------------------------------------
[ 10.263179] (direxec)/166 is trying to acquire lock:
[ 10.263184] ffff88810b4f0198 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock+0x12/0x40
[ 10.263217]
[ 10.263217] but task is already holding lock:
[ 10.263219] ffff88810b4f0198 (&mm->mmap_lock){++++}-{3:3}, at: exit_mmap+0x9c/0x830
[ 10.263238]
[ 10.263238] other info that might help us debug this:
[ 10.263241] Possible unsafe locking scenario:
[ 10.263241]
[ 10.263243] CPU0
[ 10.263245] ----
[ 10.263247] lock(&mm->mmap_lock);
[ 10.263252] lock(&mm->mmap_lock);
[ 10.263257]
[ 10.263257] *** DEADLOCK ***
[ 10.263257]
[ 10.263259] May be due to missing lock nesting notation
[ 10.263259]
[ 10.263262] 3 locks held by (direxec)/166:
[ 10.263267] #0: ffff88810b4e8548 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x70/0x1110
[ 10.263286] #1: ffff88810b4e85e0 (&sig->exec_update_lock){+.+.}-{3:3}, at: exec_mmap+0x9f/0x510
[ 10.263302] #2: ffff88810b4f0198 (&mm->mmap_lock){++++}-{3:3}, at: exit_mmap+0x9c/0x830
[ 10.263318]
[ 10.263318] stack backtrace:
[ 10.263329] CPU: 6 UID: 0 PID: 166 Comm: (direxec) Tainted: G N 6.10.0-rc7-next-20240712+ #645
[ 10.263340] Tainted: [N]=TEST
[ 10.263349] Call Trace:
[ 10.263355] <TASK>
[ 10.263360] dump_stack_lvl+0xa3/0xeb
[ 10.263375] print_deadlock_bug+0x4d5/0x680
[ 10.263387] __lock_acquire+0x65fb/0x7830
[ 10.263408] ? lock_is_held_type+0xdd/0x150
[ 10.263425] lock_acquire+0x14c/0x3e0
[ 10.263433] ? mmap_read_lock+0x12/0x40
[ 10.263445] ? lock_is_held_type+0xdd/0x150
[ 10.263454] down_read+0x58/0x9a0
[ 10.263461] ? mmap_read_lock+0x12/0x40
[ 10.263476] mmap_read_lock+0x12/0x40
[ 10.263485] unmap_single_vma+0x1bf/0x240
[ 10.263497] unmap_vmas+0x146/0x1c0
[ 10.263511] exit_mmap+0x13d/0x830
[ 10.263533] __mmput+0xc2/0x2c0
[ 10.263556] exec_mmap+0x4cb/0x510
[ 10.263580] begin_new_exec+0xfe6/0x1ba0
[ 10.263612] load_elf_binary+0x797/0x22a0
[ 10.263637] ? load_misc_binary+0x53a/0x930
[ 10.263656] ? lock_release+0x50f/0x830
[ 10.263673] ? bprm_execve+0x6d7/0x1110
[ 10.263693] bprm_execve+0x70d/0x1110
[ 10.263730] do_execveat_common+0x44b/0x600
[ 10.263745] __x64_sys_execve+0x8e/0xa0
[ 10.263754] do_syscall_64+0x71/0x110
[ 10.263764] entry_SYSCALL_64_after_hwframe+0x4b/0x53