Re: [PATCH v2 10/39] x86/mm: Introduce _PAGE_COW

From: Andrew Cooper
Date: Tue Oct 04 2022 - 22:17:46 EST

On 29/09/2022 23:29, Rick Edgecombe wrote:
> From: Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx>
> There is essentially no room left in the x86 hardware PTEs on some OSes
> (not Linux). That left the hardware architects looking for a way to
> represent a new memory type (shadow stack) within the existing bits.
> They chose to repurpose a lightly-used state: Write=0,Dirty=1.

How does "Some OSes have a greater dependence on software available bits
in PTEs than Linux" sound?

> The reason it's lightly used is that Dirty=1 is normally set _before_ a
> write. A write with a Write=0 PTE would typically only generate a fault,
> not set Dirty=1. Hardware can (rarely) both set Write=1 *and* generate the
> fault, resulting in a Dirty=0,Write=1 PTE. Hardware which supports shadow
> stacks will no longer exhibit this oddity.

Again, an interesting anecdote but not salient information here.

> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 6496ec84b953..ad201dae7316 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -134,9 +142,17 @@ static inline int pte_young(pte_t pte)
> return pte_flags(pte) & _PAGE_ACCESSED;
> }
> -static inline int pmd_dirty(pmd_t pmd)
> +static inline bool pmd_dirty(pmd_t pmd)
> {
> - return pmd_flags(pmd) & _PAGE_DIRTY;
> + return pmd_flags(pmd) & _PAGE_DIRTY_BITS;
> +}
> +
> +static inline bool pmd_shstk(pmd_t pmd)
> +{
> + if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
> + return false;
> +
> + return (pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY;

(flags & PSE|RW|D) == PSE|D;

R/O+D can exist higher in the paging structures and does not convey
type=shstk-ness to later stages of the walk.

However, there is a further complication which is bound rear its head
sooner or later, and warrants discussing.

type=shstk isn't actually only R/O+D on the leaf PTE; its also R/W on
the accumulated access rights on non-leaf PTEs.

Specifically, if you clear the RW bit on any higher level in the
pagetable, then everything mapped by that PTE ceases to be of type
shstk, even if the leaf has the R/O+D bit combination.

This is allegedly a feature for the database folks, where they can
create R/O and R/W aliases of the same memory, sharing intermediate
pagetables, where the R/W alias will set D bits per usual and the R/O
alias needs not to transmogrify itself into a shadow stack.