Re: [PATCH] mm/x86/pat: Only untrack the pfn range if unmap region

From: David Hildenbrand
Date: Sun Jul 14 2024 - 14:27:45 EST


On 14.07.24 12:59, David Wang wrote:

At 2024-07-12 22:42:44, "Peter Xu" <peterx@xxxxxxxxxx> wrote:
NOTE: I massaged the commit message comparing to the rfc post [1], the
patch itself is untouched. Also removed rfc tag, and added more people
into the loop. Please kindly help test this patch if you have a reproducer,
as I can't reproduce it myself even with the syzbot reproducer on top of
mm-unstable. Instead of further check on the reproducer, I decided to send
this out first as we have a bunch of reproducers on the list now..
---
mm/memory.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 4bcd79619574..f57cc304b318 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1827,9 +1827,6 @@ static void unmap_single_vma(struct mmu_gather *tlb,
if (vma->vm_file)
uprobe_munmap(vma, start, end);

- if (unlikely(vma->vm_flags & VM_PFNMAP))
- untrack_pfn(vma, 0, 0, mm_wr_locked);
-
if (start != end) {
if (unlikely(is_vm_hugetlb_page(vma))) {
/*
@@ -1894,6 +1891,8 @@ void unmap_vmas(struct mmu_gather *tlb, struct ma_state *mas,
unsigned long start = start_addr;
unsigned long end = end_addr;
hugetlb_zap_begin(vma, &start, &end);
+ if (unlikely(vma->vm_flags & VM_PFNMAP))
+ untrack_pfn(vma, 0, 0, mm_wr_locked);
unmap_single_vma(tlb, vma, start, end, &details,
mm_wr_locked);
hugetlb_zap_end(vma, &details);
--
2.45.0

Hi,

Today, I notice a kernel warning with this patch.


[Sun Jul 14 16:51:38 2024] OOM killer enabled.
[Sun Jul 14 16:51:38 2024] Restarting tasks ... done.
[Sun Jul 14 16:51:38 2024] random: crng reseeded on system resumption
[Sun Jul 14 16:51:38 2024] PM: suspend exit
[Sun Jul 14 16:51:38 2024] ------------[ cut here ]------------
[Sun Jul 14 16:51:38 2024] WARNING: CPU: 1 PID: 2484 at arch/x86/mm/pat/memtype.c:1002 untrack_pfn+0x10c/0x120

We fail to find what we need in the page tables, indicating that the page tables might have been modified / torn down in the meantime.

Likely we have a previous call to unmap_single_vma() that modifies the page tables, and unmaps present PFNs.

PAT is incompatible to that, it relies on information from the page tables to know what it has to undo during munmap(), or what it has to do during fork().

The splat from the previous discussion [1]:

follow_phys arch/x86/mm/pat/memtype.c:957 [inline]
get_pat_info+0xf2/0x510 arch/x86/mm/pat/memtype.c:991
untrack_pfn+0xf7/0x4d0 arch/x86/mm/pat/memtype.c:1104
unmap_single_vma+0x1bd/0x2b0 mm/memory.c:1819
zap_page_range_single+0x326/0x560 mm/memory.c:1920
unmap_mapping_range_vma mm/memory.c:3684 [inline]
unmap_mapping_range_tree mm/memory.c:3701 [inline]
unmap_mapping_pages mm/memory.c:3767 [inline]
unmap_mapping_range+0x1ee/0x280 mm/memory.c:3804
truncate_pagecache+0x53/0x90 mm/truncate.c:731
simple_setattr+0xf2/0x120 fs/libfs.c:886
notify_change+0xec6/0x11f0 fs/attr.c:499
do_truncate+0x15c/0x220 fs/open.c:65
handle_truncate fs/namei.c:3308 [inline]

indicates that file truncation seems to end up messing with a PFNMAP mapping that has PAT set. That is ... weird. I would have thought that PFNMAP would never really happen with file truncation.

Does this only happen with an OOT driver, that seems to do weird truncate stuff on files that have a PFNMAP mapping?

[1] https://lore.kernel.org/all/3879ee72-84de-4d2a-93a8-c0b3dc3f0a4c@xxxxxxxxxx/

[Sun Jul 14 16:51:38 2024] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) ctr(E) ccm(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) xt_addrtype(E) br_netfilter(E) xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) nft_compat(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) nfnetlink(E) bridge(E) stp(E) llc(E) overlay(E) binfmt_misc(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) nvidia_drm(POE) nvidia_modeset(POE) edac_mce_amd(E) kvm_amd(E) snd_hda_codec_realtek(E) kvm(E) iwlmvm(E) snd_hda_codec_generic(E) crct10dif_pclmul(E) snd_hda_scodec_component(E) snd_hda_codec_hdmi(E) ghash_clmulni_intel(E) sha512_ssse3(E) mac80211(E) sha512_generic(E) snd_hda_intel(E) nvidia(POE) sha256_ssse3(E) snd_intel_dspcfg(E) ppdev(E) sha1_ssse3(E) libarc4(E) snd_hda_codec(E) snd_usb_audio(E) snd_usbmidi_lib(E) uvcvideo(E) snd_hda_core(E) iwlwifi(E) aesni_intel(E) snd_rawmidi(E) snd_pcsp(E)
[Sun Jul 14 16:51:38 2024]  snd_hwdep(E) snd_seq_device(E) crypto_simd(E) videobuf2_vmalloc(E) snd_pcm(E) cryptd(E) uvc(E) videobuf2_memops(E) videobuf2_v4l2(E) snd_timer(E) rapl(E) cfg80211(E) k10temp(E) wmi_bmof(E) sp5100_tco(E) acpi_cpufreq(E) ccp(E) snd(E) videodev(E) drm_kms_helper(E) videobuf2_common(E) rfkill(E) video(E) rng_core(E) mc(E) soundcore(E) joydev(E) parport_pc(E) parport(E) sg(E) evdev(E) msr(E) loop(E) fuse(E) drm(E) efi_pstore(E) dm_mod(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) blake2b_generic(E) efivarfs(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) raid1(E) raid0(E) md_mod(E) hid_generic(E) usbhid(E) hid(E) sd_mod(E) ahci(E) libahci(E) xhci_pci(E) nvme(E) libata(E) crc32_pclmul(E) nvme_core(E) xhci_hcd(E) t10_pi(E) crc32c_intel(E) i2c_piix4(E) r8169(E) crc64_rocksoft(E) realtek(E) scsi_mod(E) usbcore(E) scsi_common(E) usb_common(E) wmi(E) gpio_amdpt(E) gpio_generic(E) button(E)
[Sun Jul 14 16:51:38 2024] CPU: 1 PID: 2484 Comm: gnome-shell Tainted: P           OE      6.10.0-rc7-linan-1 #283
[Sun Jul 14 16:51:38 2024] Hardware name: Micro-Star International Co., Ltd. MS-7B89/B450M MORTAR MAX (MS-7B89), BIOS 2.80 06/10/2020
[Sun Jul 14 16:51:38 2024] RIP: 0010:untrack_pfn+0x10c/0x120
[Sun Jul 14 16:51:38 2024] Code: e2 01 74 22 8b 98 e0 00 00 00 3b 5d 2c 74 ac 48 8b 7d 30 e8 66 e1 bc 00 89 5d 2c 48 8b 7d 30 e8 0a 6c 09 00 eb 95 0f 0b eb da <0f> 0b eb 95 e8 db b6 bb 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90
[Sun Jul 14 16:51:38 2024] RSP: 0018:ffffae5b4ab1fbe8 EFLAGS: 00010202
[Sun Jul 14 16:51:38 2024] RAX: 0000000000000028 RBX: 0000000000000000 RCX: 0000000000000000
[Sun Jul 14 16:51:38 2024] RDX: 0000000000000001 RSI: 000fffffffe00000 RDI: ffff91d5be99ea80
[Sun Jul 14 16:51:38 2024] RBP: ffff91d5c44fbe70 R08: 00007f2e5ff32000 R09: 0000000000000001
[Sun Jul 14 16:51:38 2024] R10: ffff91d5b7ad6d1c R11: 00007f2e5ff35fff R12: 00007f2e5ff32000
[Sun Jul 14 16:51:38 2024] R13: 0000000000000000 R14: ffffae5b4ab1fde8 R15: ffff91d5c44fbe70
[Sun Jul 14 16:51:38 2024] FS:  00007f2e5ff59dc0(0000) GS:ffff91d84ec80000(0000) knlGS:0000000000000000
[Sun Jul 14 16:51:38 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sun Jul 14 16:51:38 2024] CR2: 00007fe71316b08c CR3: 000000018468e000 CR4: 0000000000350ef0
[Sun Jul 14 16:51:38 2024] Call Trace:
[Sun Jul 14 16:51:38 2024]  <TASK>
[Sun Jul 14 16:51:38 2024]  ? __warn+0x7c/0x120
[Sun Jul 14 16:51:38 2024]  ? untrack_pfn+0x10c/0x120
[Sun Jul 14 16:51:38 2024]  ? report_bug+0x18d/0x1c0
[Sun Jul 14 16:51:38 2024]  ? handle_bug+0x3c/0x80
[Sun Jul 14 16:51:38 2024]  ? exc_invalid_op+0x13/0x60
[Sun Jul 14 16:51:38 2024]  ? asm_exc_invalid_op+0x16/0x20
[Sun Jul 14 16:51:38 2024]  ? untrack_pfn+0x10c/0x120
[Sun Jul 14 16:51:38 2024]  ? untrack_pfn+0x53/0x120
[Sun Jul 14 16:51:38 2024]  unmap_vmas+0x115/0x1a0
[Sun Jul 14 16:51:38 2024]  unmap_region+0xd4/0x150
[Sun Jul 14 16:51:38 2024]  ? mas_nomem+0x14/0x80
[Sun Jul 14 16:51:38 2024]  ? srso_return_thunk+0x5/0x5f
[Sun Jul 14 16:51:38 2024]  ? mas_store_gfp+0x54/0x110
[Sun Jul 14 16:51:38 2024]  do_vmi_align_munmap+0x2d4/0x530
[Sun Jul 14 16:51:38 2024]  do_vmi_munmap+0xda/0x190
[Sun Jul 14 16:51:38 2024]  __vm_munmap+0xa0/0x160
[Sun Jul 14 16:51:38 2024]  __x64_sys_munmap+0x17/0x20
[Sun Jul 14 16:51:38 2024]  do_syscall_64+0x4b/0x110
[Sun Jul 14 16:51:38 2024]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Sun Jul 14 16:51:38 2024] RIP: 0033:0x7f2e647208f7
[Sun Jul 14 16:51:38 2024] Code: 00 00 00 48 8b 15 09 05 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 04 0d 00 f7 d8 64 89 01 48
[Sun Jul 14 16:51:38 2024] RSP: 002b:00007ffd289f0a48 EFLAGS: 00000246 ORIG_RAX: 000000000000000b
[Sun Jul 14 16:51:38 2024] RAX: ffffffffffffffda RBX: 00007f2e5ff31000 RCX: 00007f2e647208f7
[Sun Jul 14 16:51:38 2024] RDX: 0000000000000000 RSI: 0000000000001000 RDI: 00007f2e5ff31000
[Sun Jul 14 16:51:38 2024] RBP: 0000557d5a9330a0 R08: 00000000c1d00028 R09: 00000000beef0100
[Sun Jul 14 16:51:38 2024] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[Sun Jul 14 16:51:38 2024] R13: 0000000000000001 R14: 0000000000000002 R15: 0000557d5a8408c0
[Sun Jul 14 16:51:38 2024]  </TASK>
[Sun Jul 14 16:51:38 2024] ---[ end trace 0000000000000000 ]---
[Sun Jul 14 16:51:39 2024] ------------[ cut here ]------------
[Sun Jul 14 16:51:39 2024] WARNING: CPU: 1 PID: 2272 at arch/x86/mm/pat/memtype.c:1002 track_pfn_copy+0x94/0xa0
[Sun Jul 14 16:51:39 2024] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) ctr(E) ccm(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) xt_addrtype(E) br_netfilter(E) xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) nft_compat(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) nfnetlink(E) bridge(E) stp(E) llc(E) overlay(E) binfmt_misc(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) nvidia_drm(POE) nvidia_modeset(POE) edac_mce_amd(E) kvm_amd(E) snd_hda_codec_realtek(E) kvm(E) iwlmvm(E) snd_hda_codec_generic(E) crct10dif_pclmul(E) snd_hda_scodec_component(E) snd_hda_codec_hdmi(E) ghash_clmulni_intel(E) sha512_ssse3(E) mac80211(E) sha512_generic(E) snd_hda_intel(E) nvidia(POE) sha256_ssse3(E) snd_intel_dspcfg(E) ppdev(E) sha1_ssse3(E) libarc4(E) snd_hda_codec(E) snd_usb_audio(E) snd_usbmidi_lib(E) uvcvideo(E) snd_hda_core(E) iwlwifi(E) aesni_intel(E) snd_rawmidi(E) snd_pcsp(E)
[Sun Jul 14 16:51:39 2024]  snd_hwdep(E) snd_seq_device(E) crypto_simd(E) videobuf2_vmalloc(E) snd_pcm(E) cryptd(E) uvc(E) videobuf2_memops(E) videobuf2_v4l2(E) snd_timer(E) rapl(E) cfg80211(E) k10temp(E) wmi_bmof(E) sp5100_tco(E) acpi_cpufreq(E) ccp(E) snd(E) videodev(E) drm_kms_helper(E) videobuf2_common(E) rfkill(E) video(E) rng_core(E) mc(E) soundcore(E) joydev(E) parport_pc(E) parport(E) sg(E) evdev(E) msr(E) loop(E) fuse(E) drm(E) efi_pstore(E) dm_mod(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) blake2b_generic(E) efivarfs(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) raid1(E) raid0(E) md_mod(E) hid_generic(E) usbhid(E) hid(E) sd_mod(E) ahci(E) libahci(E) xhci_pci(E) nvme(E) libata(E) crc32_pclmul(E) nvme_core(E) xhci_hcd(E) t10_pi(E) crc32c_intel(E) i2c_piix4(E) r8169(E) crc64_rocksoft(E) realtek(E) scsi_mod(E) usbcore(E) scsi_common(E) usb_common(E) wmi(E) gpio_amdpt(E) gpio_generic(E) button(E)
[Sun Jul 14 16:51:39 2024] CPU: 1 PID: 2272 Comm: Xorg Tainted: P        W  OE      6.10.0-rc7-linan-1 #283
[Sun Jul 14 16:51:39 2024] Hardware name: Micro-Star International Co., Ltd. MS-7B89/B450M MORTAR MAX (MS-7B89), BIOS 2.80 06/10/2020
[Sun Jul 14 16:51:39 2024] RIP: 0010:track_pfn_copy+0x94/0xa0
[Sun Jul 14 16:51:39 2024] Code: ff ff ff eb b4 48 89 ee 48 8b 44 24 10 48 8b 3c 24 b9 01 00 00 00 4c 29 e6 48 8d 54 24 08 48 89 44 24 08 e8 fe fc ff ff eb 8f <0f> 0b eb d0 e8 73 b9 bb 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90
[Sun Jul 14 16:51:39 2024] RSP: 0018:ffffae5b4a04fb68 EFLAGS: 00010202
[Sun Jul 14 16:51:39 2024] RAX: 0000000000000028 RBX: ffff91d546ae1d10 RCX: 0000000000000000
[Sun Jul 14 16:51:39 2024] RDX: 0000000000000001 RSI: 000fffffffe00000 RDI: ffff91d5b969c700
[Sun Jul 14 16:51:39 2024] RBP: 00007fe71316e000 R08: ffff91d639b0b9a0 R09: 00007fe71316e000
[Sun Jul 14 16:51:39 2024] R10: 00007fe71316dfff R11: 00007fe71316efff R12: 00007fe71316d000
[Sun Jul 14 16:51:39 2024] R13: ffff91d543702f40 R14: ffff91d639b0b9a0 R15: 00007fe71316e000
[Sun Jul 14 16:51:39 2024] FS:  00007fe7124f8ac0(0000) GS:ffff91d84ec80000(0000) knlGS:0000000000000000
[Sun Jul 14 16:51:39 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sun Jul 14 16:51:39 2024] CR2: 000055d6ab0453c0 CR3: 0000000179626000 CR4: 0000000000350ef0
[Sun Jul 14 16:51:39 2024] Call Trace:
[Sun Jul 14 16:51:39 2024]  <TASK>
[Sun Jul 14 16:51:39 2024]  ? __warn+0x7c/0x120
[Sun Jul 14 16:51:39 2024]  ? track_pfn_copy+0x94/0xa0

Same thing (follow-up error), during fork() we don't know what to do because the page tables were already modified and we don't know how to handle that PFNMAP mapping.

--
Cheers,

David / dhildenb