Re: [Bug report] hash_name() may cross page boundary and trigger sleep in RCU context
From: Zizhi Wo
Date: Fri Nov 28 2025 - 20:02:30 EST
在 2025/11/28 20:25, Will Deacon 写道:
On Fri, Nov 28, 2025 at 09:39:45AM +0800, Zizhi Wo wrote:
在 2025/11/28 9:18, Zizhi Wo 写道:
在 2025/11/28 9:17, Zizhi Wo 写道:
在 2025/11/27 20:59, Will Deacon 写道:
On Wed, Nov 26, 2025 at 05:05:05PM +0800, Zizhi Wo wrote:
We're running into the following issue on an ARM32 platform
with the linux
5.10 kernel:
[<c0300b78>] (__dabt_svc) from [<c0529cb8>]
(link_path_walk.part.7+0x108/0x45c)
[<c0529cb8>] (link_path_walk.part.7) from [<c052a948>]
(path_openat+0xc4/0x10ec)
[<c052a948>] (path_openat) from [<c052cf90>] (do_filp_open+0x9c/0x114)
[<c052cf90>] (do_filp_open) from [<c0511e4c>]
(do_sys_openat2+0x418/0x528)
[<c0511e4c>] (do_sys_openat2) from [<c0513d98>] (do_sys_open+0x88/0xe4)
[<c0513d98>] (do_sys_open) from [<c03000c0>]
(ret_fast_syscall+0x0/0x58)
...
[<c0315e34>] (unwind_backtrace) from [<c030f2b0>]
(show_stack+0x20/0x24)
[<c030f2b0>] (show_stack) from [<c14239f4>] (dump_stack+0xd8/0xf8)
[<c14239f4>] (dump_stack) from [<c038d188>]
(___might_sleep+0x19c/0x1e4)
[<c038d188>] (___might_sleep) from [<c031b6fc>]
(do_page_fault+0x2f8/0x51c)
[<c031b6fc>] (do_page_fault) from [<c031bb44>]
(do_DataAbort+0x90/0x118)
[<c031bb44>] (do_DataAbort) from [<c0300b78>] (__dabt_svc+0x58/0x80)
...
During the execution of
hash_name()->load_unaligned_zeropad(), a potential
memory access beyond the PAGE boundary may occur. For example, when the
filename length is near the PAGE_SIZE boundary. This
triggers a page fault,
which leads to a call to
do_page_fault()->mmap_read_trylock(). If we can't
acquire the lock, we have to fall back to the
mmap_read_lock() path, which
calls might_sleep(). This breaks RCU semantics because path
lookup occurs
under an RCU read-side critical section. In linux-mainline, arm/arm64
do_page_fault() still has this problem:
lock_mm_and_find_vma->get_mmap_lock_carefully->mmap_read_lock_killable.
And before commit bfcfaa77bdf0 ("vfs: use 'unsigned long' accesses for
dcache name comparison and hashing"), hash_name accessed the
name byte by
byte.
To prevent load_unaligned_zeropad() from accessing beyond
the valid memory
region, we would need to intercept such cases beforehand? But doing so
would require replicating the internal logic of
load_unaligned_zeropad(),
including handling endianness and constructing the correct
value manually.
Given that load_unaligned_zeropad() is used in many places across the
kernel, we currently haven't found a good solution to
address this cleanly.
What would be the recommended way to handle this situation? Would
appreciate any feedback and guidance from the community. Thanks!
Does it help if you bodge the translation fault handler along the lines
of the untested diff below?
I tried it out and it works — thank you for the solution you provided.
Thanks for giving it a spin.
At the same time, since I’m a beginner in this area, I’d like to ask a
question.
The comment above do_translation_fault() says:
“We enter here because the first level page table doesn't contain a
valid entry for the address.”
However, after modifying the code, it seems that when encountering
FSR_FS_INVALID_PAGE, the kernel no longer creates a page table entry,
but instead directly jumps to bad_area.
FSR_FS_INVALID_PAGE indicates a last level translation fault (that's the
"page" part) so it's only applicable in the case where the other levels
of page-table have been populated already.
I wondered about checking !is_vmalloc_addr() too, but I couldn't
convince myself that load_unaligned_zeropad() is only ever used with the
linear map.
Thank you very much for the answer. For the vmalloc area, I checked the
call points on the vfs side, such as dentry_string_cmp() or hash_name().
Their "names addr" are all assigned by kmalloc(), so there should be no
corresponding issues. But I'm not familiar with the other calling
points...
I'd like to ask — could this change potentially cause any other side
effects?
There's always the possibility but I personally think it's more
self-contained than the other patches doing the rounds. For example, I
don't make any changes to the permission fault handling path.
Will
Ok. Thank you for your explanation.
Thanks,
Zizhi Wo