Re: [PATCH v3] x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()

From: David Hildenbrand
Date: Wed Apr 02 2025 - 07:37:01 EST


On 25.03.25 20:19, David Hildenbrand wrote:
If track_pfn_copy() fails, we already added the dst VMA to the maple
tree. As fork() fails, we'll cleanup the maple tree, and stumble over
the dst VMA for which we neither performed any reservation nor copied
any page tables.

Consequently untrack_pfn() will see VM_PAT and try obtaining the
PAT information from the page table -- which fails because the page
table was not copied.

The easiest fix would be to simply clear the VM_PAT flag of the dst VMA
if track_pfn_copy() fails. However, the whole thing is about "simply"
clearing the VM_PAT flag is shaky as well: if we passed track_pfn_copy()
and performed a reservation, but copying the page tables fails, we'll
simply clear the VM_PAT flag, not properly undoing the reservation ...
which is also wrong.

So let's fix it properly: set the VM_PAT flag only if the reservation
succeeded (leaving it clear initially), and undo the reservation if
anything goes wrong while copying the page tables: clearing the VM_PAT
flag after undoing the reservation.

Note that any copied page table entries will get zapped when the VMA will
get removed later, after copy_page_range() succeeded; as VM_PAT is not set
then, we won't try cleaning VM_PAT up once more and untrack_pfn() will be
happy. Note that leaving these page tables in place without a reservation
is not a problem, as we are aborting fork(); this process will never run.

A reproducer can trigger this usually at the first try:

https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/reproducers/pat_fork.c

[ 45.239440] WARNING: CPU: 26 PID: 11650 at arch/x86/mm/pat/memtype.c:983 get_pat_info+0xf6/0x110
[ 45.241082] Modules linked in: ...
[ 45.249119] CPU: 26 UID: 0 PID: 11650 Comm: repro3 Not tainted 6.12.0-rc5+ #92
[ 45.250598] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[ 45.252181] RIP: 0010:get_pat_info+0xf6/0x110
...
[ 45.268513] Call Trace:
[ 45.269003] <TASK>
[ 45.269425] ? __warn.cold+0xb7/0x14d
[ 45.270131] ? get_pat_info+0xf6/0x110
[ 45.270846] ? report_bug+0xff/0x140
[ 45.271519] ? handle_bug+0x58/0x90
[ 45.272192] ? exc_invalid_op+0x17/0x70
[ 45.272935] ? asm_exc_invalid_op+0x1a/0x20
[ 45.273717] ? get_pat_info+0xf6/0x110
[ 45.274438] ? get_pat_info+0x71/0x110
[ 45.275165] untrack_pfn+0x52/0x110
[ 45.275835] unmap_single_vma+0xa6/0xe0
[ 45.276549] unmap_vmas+0x105/0x1f0
[ 45.277256] exit_mmap+0xf6/0x460
[ 45.277913] __mmput+0x4b/0x120
[ 45.278512] copy_process+0x1bf6/0x2aa0
[ 45.279264] kernel_clone+0xab/0x440
[ 45.279959] __do_sys_clone+0x66/0x90
[ 45.280650] do_syscall_64+0x95/0x180

Likely this case was missed in commit d155df53f310 ("x86/mm/pat: clear
VM_PAT if copy_p4d_range failed")

... and instead of undoing the reservation we simply cleared the VM_PAT flag.

Keep the documentation of these functions in include/linux/pgtable.h,
one place is more than sufficient -- we should clean that up for the other
functions like track_pfn_remap/untrack_pfn separately.

Reported-by: xingwei lee <xrivendell7@xxxxxxxxx>
Reported-by: yuxin wang <wang1315768607@xxxxxxx>
Closes: https://lore.kernel.org/lkml/CABOYnLx_dnqzpCW99G81DmOr+2UzdmZMk=T3uxwNxwz+R1RAwg@xxxxxxxxxxxxxx/
Reported-by: Marius Fleischer <fleischermarius@xxxxxxxxx>
Closes: https://lore.kernel.org/lkml/CAJg=8jwijTP5fre8woS4JVJQ8iUA6v+iNcsOgtj9Zfpc3obDOQ@xxxxxxxxxxxxxx/
Fixes: d155df53f310 ("x86/mm/pat: clear VM_PAT if copy_p4d_range failed")
Fixes: 2ab640379a0a ("x86: PAT: hooks in generic vm code to help archs to track pfnmap regions - v3")
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Dan Carpenter <dan.carpenter@xxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Peter Xu <peterx@xxxxxxxxxx>
Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
---

Apparently smatch is not happy about some scenarios. The following might make it happy, and make track_pfn_copy() obey the documentation "pfn set if rc == 0".

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index d721cc19addbd..a51d21d2e5198 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -992,8 +992,10 @@ int track_pfn_copy(struct vm_area_struct *dst_vma,
pgprot_t pgprot;
int rc;

- if (!(src_vma->vm_flags & VM_PAT))
+ if (!(src_vma->vm_flags & VM_PAT)) {
+ *pfn = 0;
return 0;
+ }

/*
* Duplicate the PAT information for the dst VMA based on the src
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 4c107e17c547e..d4b564aacab8f 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1515,6 +1515,7 @@ static inline void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
static inline int track_pfn_copy(struct vm_area_struct *dst_vma,
struct vm_area_struct *src_vma, unsigned long *pfn)
{
+ *pfn = 0;
return 0;
}



--
Cheers,

David / dhildenb