Re: [PATCH v8 3/5] arm64: mm: support large block mapping when rodata=full
From: Dev Jain
Date: Mon Nov 03 2025 - 00:54:07 EST
With lock debugging enabled, we see a large number of "BUG: sleeping
function called from invalid context at kernel/locking/mutex.c:580"
and "BUG: Invalid wait context:" backtraces when running v6.18-rc3.
Please see example below.
Bisect points to this patch.
Please let me know if there is anything I can do to help tracking
down the problem.
Thanks for the report - ouch!
I expect you're running on a system that supports BBML2_NOABORT, based on the
stack trace, I expect you have CONFIG_DEBUG_PAGEALLOC enabled? That will cause
permission tricks to be played on the linear map at page allocation and free
time, which can happen in non-sleepable contexts. And with this patch we are
taking pgtable_split_lock (a mutex) in split_kernel_leaf_mapping(), which is
called as a result of the permission change request.
However, when CONFIG_DEBUG_PAGEALLOC enabled we always force-map the linear map
by PTE so split_kernel_leaf_mapping() is actually unneccessary and will return
without actually having to split anything. So we could add an early "if
(force_pte_mapping()) return 0;" to bypass the function entirely in this case,
and I *think* that should solve it.
But I'm also concerned about KFENCE. I can't remember it's exact semantics off
the top of my head, so I'm concerned we could see similar problems there (where
we only force pte mapping for the KFENCE pool).
I'll investigate fully tomorrow and hopefully provide a fix.
Here's a proposed fix, although I can't get access to a system with BBML2 until
tomorrow at the earliest. Guenter, I wonder if you could check that this
resolves your issue?
---8<---
commit 602ec2db74e5abfb058bd03934475ead8558eb72
Author: Ryan Roberts <ryan.roberts@xxxxxxx>
Date: Sun Nov 2 11:45:18 2025 +0000
arm64: mm: Don't attempt to split known pte-mapped regions
It has been reported that split_kernel_leaf_mapping() is trying to sleep
in non-sleepable context. It does this when acquiring the
pgtable_split_lock mutex, when either CONFIG_DEBUG_ALLOC or
CONFIG_KFENCE are enabled, which change linear map permissions within
softirq context during memory allocation and/or freeing.
But it turns out that the memory for which these features may attempt to
modify the permissions is always mapped by pte, so there is no need to
attempt to split the mapping. So let's exit early in these cases and
avoid attempting to take the mutex.
Closes: https://lore.kernel.org/all/f24b9032-0ec9-47b1-8b95-c0eeac7a31c5@xxxxxxxxxxxx/
Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b8d37eb037fc..6e26f070bb49 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -708,6 +708,16 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
return ret;
}
+static inline bool force_pte_mapping(void)
+{
+ bool bbml2 = system_capabilities_finalized() ?
+ system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
+
+ return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
+ is_realm_world())) ||
+ debug_pagealloc_enabled();
+}
+
static DEFINE_MUTEX(pgtable_split_lock);
int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
@@ -723,6 +733,16 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
if (!system_supports_bbml2_noabort())
return 0;
+ /*
+ * If the region is within a pte-mapped area, there is no need to try to
+ * split. Additionally, CONFIG_DEBUG_ALLOC and CONFIG_KFENCE may change
Nit: CONFIG_DEBUG_PAGEALLOC.
+ * permissions from softirq context so for those cases (which are always
+ * pte-mapped), we must not go any further because taking the mutex
+ * below may sleep.
+ */
+ if (force_pte_mapping() || is_kfence_address((void *)start))
+ return 0;
+
/*
* Ensure start and end are at least page-aligned since this is the
* finest granularity we can split to.
@@ -1009,16 +1029,6 @@ static inline void arm64_kfence_map_pool(phys_addr_t kfence_pool, pgd_t *pgdp) {
#endif /* CONFIG_KFENCE */
-static inline bool force_pte_mapping(void)
-{
- bool bbml2 = system_capabilities_finalized() ?
- system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
-
- return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
- is_realm_world())) ||
- debug_pagealloc_enabled();
-}
-
Otherwise LGTM.
Reviewed-by: Dev Jain <dev.jain@xxxxxxx>
static void __init map_mem(pgd_t *pgdp)
{
static const u64 direct_map_end = _PAGE_END(VA_BITS_MIN);
---8<---
Thanks,
Ryan
Yang Shi, Do you have any additional thoughts?
Thanks,
Ryan