Re: 6.10/bisected/regression - commit 8430557fc584 cause warning at mm/page_table_check.c:198 __page_table_check_ptes_set+0x306

From: Peter Xu
Date: Wed May 22 2024 - 12:11:01 EST


On Wed, May 22, 2024 at 05:34:21PM +0200, David Hildenbrand wrote:
> On 22.05.24 17:18, Peter Xu wrote:
> > On Wed, May 22, 2024 at 09:48:51AM +0200, David Hildenbrand wrote:
> > > On 22.05.24 00:36, Peter Xu wrote:
> > > > On Wed, May 22, 2024 at 03:21:04AM +0500, Mikhail Gavrilov wrote:
> > > > > On Wed, May 22, 2024 at 2:37 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
> > > > > > Hmm I still cannot reproduce. Weird.
> > > > > >
> > > > > > Would it be possible for you to identify which line in debug_vm_pgtable.c
> > > > > > triggered that issue?
> > > > > >
> > > > > > I think it should be some set_pte_at() but I'm not sure, as there aren't a
> > > > > > lot and all of them look benign so far. It could be that I missed
> > > > > > something important.
> > > > >
> > > > > I hope it's helps:
> > > >
> > > > Thanks for offering this, it's just that it doesn't look coherent with what
> > > > was reported for some reason.
> > > >
> > > > >
> > > > > > sh /usr/src/kernels/(uname -r)/scripts/faddr2line /lib/debug/lib/modules/(uname -r)/vmlinux debug_vm_pgtable+0x1c04
> > > > > debug_vm_pgtable+0x1c04/0x3360:
> > > > > native_ptep_get_and_clear at arch/x86/include/asm/pgtable_64.h:94
> > > > > (inlined by) ptep_get_and_clear at arch/x86/include/asm/pgtable.h:1262
> > > > > (inlined by) ptep_clear at include/linux/pgtable.h:509
> > > >
> > > > This is a pte_clear(), and pte_clear() shouldn't even do the set() checks,
> > > > and shouldn't stumble over what I added.
> > > >
> > > > IOW, it doesn't match with the real stack dump previously:
> > > >
> > > > [ 5.581003] ? __page_table_check_ptes_set+0x306/0x3c0
> > > > [ 5.581274] ? __pfx___page_table_check_ptes_set+0x10/0x10
> > > > [ 5.581544] ? __pfx_check_pgprot+0x10/0x10
> > > > [ 5.581806] set_ptes.constprop.0+0x66/0xd0
> > > > [ 5.582072] ? __pfx_set_ptes.constprop.0+0x10/0x10
> > > > [ 5.582333] ? __pfx_pte_val+0x10/0x10
> > > > [ 5.582595] debug_vm_pgtable+0x1c04/0x3360
> > > >
> > >
> > > Staring at pte_clear_tests():
> > >
> > > #ifndef CONFIG_RISCV
> > > pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
> > > #endif
> > > set_pte_at(args->mm, args->vaddr, args->ptep, pte);
> > >
> > > So we set random PTE bits, probably setting the present, uffd and write bit
> > > at the same time. That doesn't make too much sense when we want to perform
> > > that such combinations cannot exist.
> >
> > Here the issue is I don't think it should set W bit anyway, as we init
> > page_prot to be RWX but !shared:
> >
> > args->page_prot = vm_get_page_prot(VM_ACCESS_FLAGS);
> >
> > On x86_64 (Mikhail's system) it should have W bit cleared afaict, meanwhile
> > the RANDOM_ORVALUE won't touch bit W due to S390_SKIP_MASK (which contains
> > bit W / bit 1, which is another "accident"..). Then even if with that it
> > should not trigger.. I think that's also why I cannot reproduce this
> > problem locally.
>
> Why oh why are skip mask applied independently of the architecture.
>
> While _PAGE_RW should indeed be masked out by RANDOM_ORVALUE.
>
> But with shadow stacks we consider a PTE writable (see
> pte_write()->pte_shstk()) if
> (1) X86_FEATURE_SHSTK is enabled
> (2) _PAGE_RW is clear
> (3) _PAGE_DIRTY is set
>
> _PAGE_DIRTY is bit 6.
>
> Likely your CPU does not support shadow stacks.

Good point. My host has it, but I tested in the VM which doesn't. I
suppose we can wait and double check whether Mikhail should see the issue
went away with that patch provided.

In this case, instead of keep fiddling with random bits to apply and
further work on top of per-arch random bits, I'd hope we can simply drop
that random mechanism as I don't think it'll be pxx_none() now. I attached
a patch I plan to post. Does it look reasonable?

I also copied Anshuman, Gavin and Aneesh.

Thanks,

===8<===