Re: [PATCH] arm64: mm: Fix kexec failure after pte_mkwrite_novma() change

From: Chang, Jianpeng (CN)

Date: Thu Nov 27 2025 - 05:24:46 EST



On 11/27/2025 1:41 PM, Anshuman Khandual wrote:
CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.

On 27/11/25 9:13 AM, Jianpeng Chang wrote:
Commit 143937ca51cc ("arm64, mm: avoid always making PTE dirty in
pte_mkwrite()") modified pte_mkwrite_novma() to only clear PTE_RDONLY
when the page is already dirty (PTE_DIRTY is set). While this optimization
prevents unnecessary dirty page marking in normal memory management paths,
it breaks kexec on some platforms like NXP LS1043.

The issue occurs in the kexec code path:
1. machine_kexec_post_load() calls trans_pgd_create_copy() to create a
writable copy of the linear mapping
2. _copy_pte() calls pte_mkwrite_novma() to ensure all pages in the copy
are writable for the new kernel image copying
3. With the new logic, clean pages (without PTE_DIRTY) remain read-only
4. When kexec tries to copy the new kernel image through the linear
mapping, it fails on read-only pages, causing the system to hang
after "Bye!"

The same issue affects hibernation which uses the same trans_pgd code path.

Fix this by explicitly clearing PTE_RDONLY in _copy_pte() for both
kexec and hibernation, ensuring all pages in the temporary mapping are
writable regardless of their dirty state. This preserves the original
commit's optimization for normal memory management while fixing the
kexec/hibernation regression.

Fixes: 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()")
Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@xxxxxxxxxxxxx>
---
arch/arm64/mm/trans_pgd.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
index 18543b603c77..ad4e5e4fcc91 100644
--- a/arch/arm64/mm/trans_pgd.c
+++ b/arch/arm64/mm/trans_pgd.c
@@ -40,8 +40,13 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
* Resume will overwrite areas that may be marked
* read only (code, rodata). Clear the RDONLY bit from
* the temporary mappings we use during restore.
+ *
+ * For kexec/hibernation, we need writable access regardless
+ * of the page's dirty state, so force clear PTE_RDONLY.
*/
Hence pte_mkwrite_novma() helper cannot be used here as would have
normally expected. Might be good idea to add to the above comment.
I'll add this to make it clear that we can't use pte_mkwrite_novma().

- __set_pte(dst_ptep, pte_mkwrite_novma(pte));
+ pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
+ pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
+ __set_pte(dst_ptep, pte);
} else if (!pte_none(pte)) {
/*
* debug_pagealloc will removed the PTE_VALID bit if
@@ -57,7 +62,10 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
*/
BUG_ON(!pfn_valid(pte_pfn(pte)));

- __set_pte(dst_ptep, pte_mkvalid(pte_mkwrite_novma(pte)));
+ pte = pte_mkvalid(pte);
Probably better to move pte_mkvalid() just after clearing PTE_RDONLY.
I'll make this change.

+ pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
+ pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
+ __set_pte(dst_ptep, pte);
}
}

Just wondering if it would be worth adding a local helper for the set
PTE_WRITE --> clear PTR_RDONLY sequence describing its difference with
now updated pte_mkwrite_novma() helper with the earlier comment.

Thank you for the review! I appreciate your suggestions.

You're right that a local helper would make the code more readable and clearly

document the difference from pte_mkwrite_novma().

I have a small concern about placing PTE manipulation functions outside of

pgtable.h - is this a good way? Or would you prefer this as a local static inline

helper within trans_pgd.c given its specific use case for kexec/hibernation?


I'll implement whichever approach you think is more appropriate in v2.


Best regards,

Jianpeng