Re: [Bug report] hash_name() may cross page boundary and trigger sleep in RCU context

From: Zizhi Wo

Date: Thu Nov 27 2025 - 20:39:50 EST




在 2025/11/28 9:18, Zizhi Wo 写道:


在 2025/11/28 9:17, Zizhi Wo 写道:


在 2025/11/27 20:59, Will Deacon 写道:
On Wed, Nov 26, 2025 at 05:05:05PM +0800, Zizhi Wo wrote:
We're running into the following issue on an ARM32 platform with the linux
5.10 kernel:

[<c0300b78>] (__dabt_svc) from [<c0529cb8>] (link_path_walk.part.7+0x108/0x45c)
[<c0529cb8>] (link_path_walk.part.7) from [<c052a948>] (path_openat+0xc4/0x10ec)
[<c052a948>] (path_openat) from [<c052cf90>] (do_filp_open+0x9c/0x114)
[<c052cf90>] (do_filp_open) from [<c0511e4c>] (do_sys_openat2+0x418/0x528)
[<c0511e4c>] (do_sys_openat2) from [<c0513d98>] (do_sys_open+0x88/0xe4)
[<c0513d98>] (do_sys_open) from [<c03000c0>] (ret_fast_syscall+0x0/0x58)
...
[<c0315e34>] (unwind_backtrace) from [<c030f2b0>] (show_stack+0x20/0x24)
[<c030f2b0>] (show_stack) from [<c14239f4>] (dump_stack+0xd8/0xf8)
[<c14239f4>] (dump_stack) from [<c038d188>] (___might_sleep+0x19c/0x1e4)
[<c038d188>] (___might_sleep) from [<c031b6fc>] (do_page_fault+0x2f8/0x51c)
[<c031b6fc>] (do_page_fault) from [<c031bb44>] (do_DataAbort+0x90/0x118)
[<c031bb44>] (do_DataAbort) from [<c0300b78>] (__dabt_svc+0x58/0x80)
...

During the execution of hash_name()->load_unaligned_zeropad(), a potential
memory access beyond the PAGE boundary may occur. For example, when the
filename length is near the PAGE_SIZE boundary. This triggers a page fault,
which leads to a call to do_page_fault()->mmap_read_trylock(). If we can't
acquire the lock, we have to fall back to the mmap_read_lock() path, which
calls might_sleep(). This breaks RCU semantics because path lookup occurs
under an RCU read-side critical section. In linux-mainline, arm/arm64
do_page_fault() still has this problem:

lock_mm_and_find_vma->get_mmap_lock_carefully->mmap_read_lock_killable.

And before commit bfcfaa77bdf0 ("vfs: use 'unsigned long' accesses for
dcache name comparison and hashing"), hash_name accessed the name byte by
byte.

To prevent load_unaligned_zeropad() from accessing beyond the valid memory
region, we would need to intercept such cases beforehand? But doing so
would require replicating the internal logic of load_unaligned_zeropad(),
including handling endianness and constructing the correct value manually.
Given that load_unaligned_zeropad() is used in many places across the
kernel, we currently haven't found a good solution to address this cleanly.

What would be the recommended way to handle this situation? Would
appreciate any feedback and guidance from the community. Thanks!

Does it help if you bodge the translation fault handler along the lines
of the untested diff below?

I tried it out and it works — thank you for the solution you provided.

At the same time, since I’m a beginner in this area, I’d like to ask a
question.

The comment above do_translation_fault() says:
“We enter here because the first level page table doesn't contain a
valid entry for the address.”

However, after modifying the code, it seems that when encountering
FSR_FS_INVALID_PAGE, the kernel no longer creates a page table entry,
but instead directly jumps to bad_area.

I'd like to ask — could this change potentially cause any other side
effects?

Thanks,
Zizhi Wo



Thank you for the solution you provided. However, I seem to have
encountered a bit of a problem.


Will

--->8

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index bf1577216ffa..b3c81e448798 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -407,7 +407,7 @@ do_translation_fault(unsigned long addr, unsigned int fsr,
         if (addr < TASK_SIZE)
                 return do_page_fault(addr, fsr, regs);
-       if (user_mode(regs))
+       if (user_mode(regs) || fsr_fs(fsr) == FSR_FS_INVALID_PAGE)
                 goto bad_area;



I'm getting an "FSR_FS_INVALID_PAGE undeclared" error during
compilation...

In which kernel or FSR version was this macro or constant defined

Sorry, I didn't see this "#define FSR_FS_INVALID_PAGE". I'll try again
right away.

Please ignore my previous reply.


         index = pgd_index(addr);
diff --git a/arch/arm/mm/fault.h b/arch/arm/mm/fault.h
index 9ecc2097a87a..8fb26f85e361 100644
--- a/arch/arm/mm/fault.h
+++ b/arch/arm/mm/fault.h
@@ -12,6 +12,8 @@
  #define FSR_FS3_0              (15)
  #define FSR_FS5_0              (0x3f)
+#define FSR_FS_INVALID_PAGE    7
+
  #ifdef CONFIG_ARM_LPAE
  #define FSR_FS_AEA             17
diff --git a/arch/arm/mm/fsr-2level.c b/arch/arm/mm/fsr-2level.c
index f2be95197265..c7060da345df 100644
--- a/arch/arm/mm/fsr-2level.c
+++ b/arch/arm/mm/fsr-2level.c
@@ -11,7 +11,7 @@ static struct fsr_info fsr_info[] = {
         { do_bad,               SIGBUS,  0,             "external abort on linefetch"      },
         { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "section translation fault"        },
         { do_bad,               SIGBUS,  0,             "external abort on linefetch"      },
-       { do_page_fault,        SIGSEGV, SEGV_MAPERR,   "page translation fault"           },
+       { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "page translation fault"           },
         { do_bad,               SIGBUS,  0,             "external abort on non-linefetch"  },
         { do_bad,               SIGSEGV, SEGV_ACCERR,   "section domain fault"             },
         { do_bad,               SIGBUS,  0,             "external abort on non-linefetch"  },
diff --git a/arch/arm/mm/fsr-3level.c b/arch/arm/mm/fsr-3level.c
index d0ae2963656a..19df4af828bd 100644
--- a/arch/arm/mm/fsr-3level.c
+++ b/arch/arm/mm/fsr-3level.c
@@ -7,7 +7,7 @@ static struct fsr_info fsr_info[] = {
         { do_bad,               SIGBUS,  0,             "reserved translation fault"    },
         { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 1 translation fault"     },
         { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 2 translation fault"     },
-       { do_page_fault,        SIGSEGV, SEGV_MAPERR,   "level 3 translation fault"     },
+       { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 3 translation fault"     },
         { do_bad,               SIGBUS,  0,             "reserved access flag fault"    },
         { do_bad,               SIGSEGV, SEGV_ACCERR,   "level 1 access flag fault"     },
         { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 2 access flag fault"     },



By the way, I tried Al's solution, and this problem didn't reproduce.

Thanks,
Zizhi Wo