Re: [tip: x86/mm] x86/mm: Clear _PAGE_DIRTY when we clear _PAGE_RW

From: Ingo Molnar
Date: Wed Feb 26 2025 - 05:56:57 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Tue, 25 Feb 2025 at 20:00, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > I think the entire point of this file is to manipulate kernel mappings.
>
> Very likely. But just looking at the patch, it was very non-obvious.

Yeah, agreed - and I've extended the changelog the following way:

Subject: [PATCH] x86/mm: Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW

The bit pattern of _PAGE_DIRTY set and _PAGE_RW clear is used to mark
shadow stacks. This is currently checked for in mk_pte() but not
pfn_pte(). If we add the check to pfn_pte(), it catches vfree()
calling set_direct_map_invalid_noflush() which calls
__change_page_attr() which loads the old protection bits from the
PTE, clears the specified bits and uses pfn_pte() to construct the
new PTE.

We should, therefore, for kernel mappings, clear the _PAGE_DIRTY bit
consistently whenever we clear _PAGE_RW. I opted to do it in the
callers in case we want to use __change_page_attr() to create shadow
stacks inside the kernel at some point in the future. Arguably, we
might also want to clear _PAGE_ACCESSED here.

Note that the 3 functions involved:

__set_pages_np()
kernel_map_pages_in_pgd()
kernel_unmap_pages_in_pgd()

Only ever manipulate non-swappable kernel mappings, so maintaining
the DIRTY:1|RW:0 special pattern for shadow stacks and DIRTY:0
pattern for non-shadow-stack entries can be maintained consistently
and doesn't result in the unintended clearing of a live dirty bit
that could corrupt (destroy) dirty bit information for user mappings.

Is this explanation better?

Thanks,

Ingo