6.10/bisected/regression - commit 8430557fc584 cause warning at mm/page_table_check.c:198 __page_table_check_ptes_set+0x306

From: Mikhail Gavrilov
Date: Tue May 21 2024 - 16:17:43 EST


Hi,
Yesterday, after the next kernel snapshot update I spotted new warning
at mm/page_table_check.c:198 with follow stacktrace:
[ 5.524572] debug_vm_pgtable: [debug_vm_pgtable ]:
Validating architecture page table helpers
[ 5.572473] ------------[ cut here ]------------
[ 5.572871] WARNING: CPU: 0 PID: 1 at mm/page_table_check.c:198
__page_table_check_ptes_set+0x306/0x3c0
[ 5.573364] Modules linked in:
[ 5.573604] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W
------- ---
6.10.0-0.rc0.20240520giteb6a9339efeb.9.fc41.x86_64+debug #1
[ 5.574089] Hardware name: ASRock B650I Lightning WiFi/B650I
Lightning WiFi, BIOS 2.10 03/20/2024
[ 5.574339] RIP: 0010:__page_table_check_ptes_set+0x306/0x3c0
[ 5.574591] Code: 74 24 04 89 ea 48 89 df e8 e7 f3 ff ff e9 12 ff
ff ff 0f 1f 44 00 00 48 c1 e8 06 89 c5 83 e5 01 e9 b0 fe ff ff f6 c2
02 74 31 <0f> 0b e9 de fd ff ff 49 83 e7 f7 48 89 c1 4c 21 f9 89 ca 83
e1 02
[ 5.575434] RSP: 0018:ffffc9000018f9d0 EFLAGS: 00010246
[ 5.575739] RAX: fff0000000000fff RBX: ffff888124da5000 RCX: 0000000000000001
[ 5.576064] RDX: 0000000000000040 RSI: bffffffffffffff5 RDI: ffffc9000018fa00
[ 5.576395] RBP: ffff888124511e40 R08: 0000000000000000 R09: 0000000000000001
[ 5.576730] R10: ffffffff97f63527 R11: 0000000000000000 R12: ffffea0005000008
[ 5.577048] R13: 1ffff92000031f3c R14: 0000000000000000 R15: bffffffffffffff5
[ 5.577335] FS: 0000000000000000(0000) GS:ffff888df7e00000(0000)
knlGS:0000000000000000
[ 5.577631] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.577925] CR2: ffff888a53601000 CR3: 0000000a4de98000 CR4: 0000000000f50ef0
[ 5.578208] PKRU: 55555554
[ 5.578483] Call Trace:
[ 5.578496] usb 1-3: new high-speed USB device number 2 using xhci_hcd
[ 5.578760] <TASK>
[ 5.579331] ? __warn.cold+0x5b/0x1af
[ 5.579618] ? __page_table_check_ptes_set+0x306/0x3c0
[ 5.579903] ? report_bug+0x1fc/0x3d0
[ 5.580188] ? handle_bug+0x3c/0x80
[ 5.580461] ? exc_invalid_op+0x17/0x40
[ 5.580731] ? asm_exc_invalid_op+0x1a/0x20
[ 5.581003] ? __page_table_check_ptes_set+0x306/0x3c0
[ 5.581274] ? __pfx___page_table_check_ptes_set+0x10/0x10
[ 5.581544] ? __pfx_check_pgprot+0x10/0x10
[ 5.581806] set_ptes.constprop.0+0x66/0xd0
[ 5.582072] ? __pfx_set_ptes.constprop.0+0x10/0x10
[ 5.582333] ? __pfx_pte_val+0x10/0x10
[ 5.582595] debug_vm_pgtable+0x1c04/0x3360
[ 5.582849] ? __pfx_debug_vm_pgtable+0x10/0x10
[ 5.583099] ? add_device_randomness+0xb8/0xf0
[ 5.583334] ? __pfx_add_device_randomness+0x10/0x10
[ 5.583573] ? __pfx_debug_vm_pgtable+0x10/0x10
[ 5.583804] do_one_initcall+0xd6/0x460
[ 5.584034] ? __pfx_do_one_initcall+0x10/0x10
[ 5.584252] ? kernel_init_freeable+0x4cb/0x750
[ 5.584465] kernel_init_freeable+0x6b4/0x750
[ 5.584674] ? __pfx_kernel_init_freeable+0x10/0x10
[ 5.584877] ? __pfx_kernel_init+0x10/0x10
[ 5.585068] ? __pfx_kernel_init+0x10/0x10
[ 5.585253] kernel_init+0x1c/0x150
[ 5.585434] ? __pfx_kernel_init+0x10/0x10
[ 5.585616] ret_from_fork+0x31/0x70
[ 5.585791] ? __pfx_kernel_init+0x10/0x10
[ 5.585971] ret_from_fork_asm+0x1a/0x30
[ 5.586146] </TASK>
[ 5.586312] irq event stamp: 1743772
[ 5.586475] hardirqs last enabled at (1743771):
[<ffffffff92c35f2e>] kasan_quarantine_put+0x12e/0x250
[ 5.586816] hardirqs last disabled at (1743772):
[<ffffffff9546895c>] _raw_spin_lock_irqsave+0x7c/0xa0
[ 5.587185] softirqs last enabled at (1742786):
[<ffffffff922721fb>] __irq_exit_rcu+0xbb/0x1c0
[ 5.587379] softirqs last disabled at (1742781):
[<ffffffff922721fb>] __irq_exit_rcu+0xbb/0x1c0
[ 5.587573] ---[ end trace 0000000000000000 ]---
[ 5.656111] page_owner is disabled

Bisect is pointed to commit:
8430557fc584657559bfbd5150b6ae1bb90f35a0
Author: Peter Xu <peterx@xxxxxxxxxx>
Date: Wed Apr 17 17:25:49 2024 -0400

mm/page_table_check: support userfault wr-protect entries

Allow page_table_check hooks to check over userfaultfd wr-protect criteria
upon pgtable updates. The rule is no co-existance allowed for any
writable flag against userfault wr-protect flag.

This should be better than c2da319c2e, where we used to only sanitize such
issues during a pgtable walk, but when hitting such issue we don't have a
good chance to know where does that writable bit came from [1], so that
even the pgtable walk exposes a kernel bug (which is still helpful on
triaging) but not easy to track and debug.

Now we switch to track the source. It's much easier too with the recent
introduction of page table check.

There are some limitations with using the page table check here for
userfaultfd wr-protect purpose:

- It is only enabled with explicit enablement of page table check configs
and/or boot parameters, but should be good enough to track at least
syzbot issues, as syzbot should enable PAGE_TABLE_CHECK[_ENFORCED] for
x86 [1]. We used to have DEBUG_VM but it's now off for most distros,
while distros also normally not enable PAGE_TABLE_CHECK[_ENFORCED], which
is similar.

- It conditionally works with the ptep_modify_prot API. It will be
bypassed when e.g. XEN PV is enabled, however still work for most of the
rest scenarios, which should be the common cases so should be good
enough.

- Hugetlb check is a bit hairy, as the page table check cannot identify
hugetlb pte or normal pte via trapping at set_pte_at(), because of the
current design where hugetlb maps every layers to pte_t... For example,
the default set_huge_pte_at() can invoke set_pte_at() directly and lose
the hugetlb context, treating it the same as a normal pte_t. So far it's
fine because we have huge_pte_uffd_wp() always equals to pte_uffd_wp() as
long as supported (x86 only). It'll be a bigger problem when we'll
define _PAGE_UFFD_WP differently at various pgtable levels, because then
one huge_pte_uffd_wp() per-arch will stop making sense first.. as of now
we can leave this for later too.

This patch also removes commit c2da319c2e altogether, as we have something
better now.

[1] https://lore.kernel.org/all/000000000000dce0530615c89210@xxxxxxxxxx/

Link: https://lkml.kernel.org/r/20240417212549.2766883-1-peterx@xxxxxxxxxx
Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
Reviewed-by: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx>
Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
Cc: David Hildenbrand <david@xxxxxxxxxx>
Cc: Nadav Amit <nadav.amit@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>

Documentation/mm/page_table_check.rst | 9 ++++++++-
arch/x86/include/asm/pgtable.h | 18 +-----------------
mm/page_table_check.c | 30 ++++++++++++++++++++++++++++++
3 files changed, 39 insertions(+), 18 deletions(-)


For convincing that bisect was a correct I reverted this commit and
checked again kernel snapshot.
And yes, the warning message is gone.

I also attach below a full kernel log and build config.

My hardware specs: https://linux-hardware.org/?probe=b34f0353df

Peter, can you look please.

--
Best Regards,
Mike Gavrilov.