Re: [PATCH v3 1/2] ARM: mm: fix use-after-free in __do_user_fault() under CONFIG_DEBUG_USER

From: Qi Xi

Date: Fri Jun 26 2026 - 21:39:49 EST

Hi Russell,

Thank you for the review. I understand the general concern about
taking locks in fault paths, but I would like to clarify the specific
case here.

__do_user_fault() with CONFIG_DEBUG_USER is not a kernel-dying path.
After show_pte() prints debug info, the kernel calls
force_sig_fault(SIGSEGV) and returns to user space. The system
continues running normally. Without this fix, a concurrent munmap can
cause show_pte() to trigger a secondary kernel fault, turning a
harmless SIGSEGV into a kernel panic.

Regarding your concern about the mm lock being already held: I have
verified that all three callers of __do_user_fault() (do_page_fault
-> bad_area, do_bad_area user path, and do_kernel_address_page_fault
user path) release mmap_read_lock or never hold it before entering
__do_user_fault(). So the lock is not held here.

It is also worth noting that we did NOT modify the paths where the
kernel is already dying (die_kernel_fault, __do_kernel_fault). Those
paths remain unchanged and continue to call show_pte() without any
lock, just as they always have.

On 26/06/2026 17:44, Russell King wrote:

On Fri, Jun 26, 2026 at 03:30:47PM +0800, Qi Xi wrote:

When CONFIG_DEBUG_USER is enabled with user_debug=31 on 32-bit ARM,
a user page fault triggers show_pte() via __do_user_fault() after
do_page_fault() has already released mmap_read_lock. If another
thread concurrently calls munmap(), the page table pages can be
freed while show_pte() is still reading them, causing a
use-after-free in show_pte().

The race can be reproduced on multi_v7_defconfig with:
CONFIG_DEBUG_USER=y
CONFIG_ARM_LPAE=y
kernel command line: user_debug=31

A delay inserted in show_pte() for testing widens the race window and
makes the UAF reliably reproducible. On LPAE, the race works as
follows:

CPU 0 (fault path) CPU 1 (munmap)
munmap(page 0) -> clears PTE[0]
PTE/PMD pages remain

read page 0 -> page fault
-> do_DataAbort()
-> do_page_fault()
-> lock_mm_and_find_vma() -> no VMA
(mmap_read_lock released)
-> __do_user_fault()
-> show_pte(tsk->mm, addr)
-> *pgd (valid)
-> p4d/pud checks pass

-> [delay] munmap(page 1)
-> clears PTE[1]
-> PTE/PMD pages freed
-> PGD cleared

-> pmd_offset(pud, addr)
-> *pud=0 -> __va(0)
-> dereference
-> secondary data abort (kernel)

Fix by taking mmap_read_lock() around show_pte() in __do_user_fault().
__do_user_fault() is called from process context with interrupts
enabled, so the context can sleep and mmap_read_lock() is safe here.

This is a fault path which should only be called when something is
already wrong, the mm lock may already be held (e.g. a kernel
fault while already holding the mmap lock.) We can't take any locks
here.