Re: [PATCH v6 18/41] mm: Introduce VM_SHADOW_STACK for shadow stack memory

From: Edgecombe, Rick P
Date: Mon Feb 20 2023 - 17:08:20 EST


On Mon, 2023-02-20 at 13:56 +0100, David Hildenbrand wrote:
> On 18.02.23 22:14, Rick Edgecombe wrote:
> > From: Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx>
> >
> > The x86 Control-flow Enforcement Technology (CET) feature includes
> > a new
> > type of memory called shadow stack. This shadow stack memory has
> > some
> > unusual properties, which requires some core mm changes to function
> > properly.
> >
> > A shadow stack PTE must be read-only and have _PAGE_DIRTY set.
> > However,
> > read-only and Dirty PTEs also exist for copy-on-write (COW) pages.
> > These
> > two cases are handled differently for page faults. Introduce
> > VM_SHADOW_STACK to track shadow stack VMAs.
>
> I suggest simplifying and abstracting that description.
>
> "New hardware extensions implement support for shadow stack memory,
> such
> as x86 Control-flow Enforcement Technology (CET). Let's add a new VM
> flag to identify these areas, for example, to be used to properly
> indicate shadow stack PTEs to the hardware."

Ah yea, that top blurb was added to all the non-x86 arch patches after
some feedback from Andrew Morton. He had said basically (in some more
colorful language) that the changelogs (at the time) were written
assuming the reader knows what a shadow stack is.

So it might be worth keeping a little more info in the log?

>
> >
> > Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>
> > Tested-by: Pengfei Xu <pengfei.xu@xxxxxxxxx>
> > Tested-by: John Allen <john.allen@xxxxxxx>
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx>
> > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
> > Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> >
> > ---
> > v6:
> > - Add comment about VM_SHADOW_STACK not being allowed with
> > VM_SHARED
> > (David Hildenbrand)
>
> Might want to add some more meat to the patch description why that
> is
> the case.

Sure.

>
> >
> > v3:
> > - Drop arch specific change in arch_vma_name(). The memory can
> > show as
> > anonymous (Kirill)
> > - Change CONFIG_ARCH_HAS_SHADOW_STACK to
> > CONFIG_X86_USER_SHADOW_STACK
> > in show_smap_vma_flags() (Boris)
> > ---
> > Documentation/filesystems/proc.rst | 1 +
> > fs/proc/task_mmu.c | 3 +++
> > include/linux/mm.h | 8 ++++++++
> > 3 files changed, 12 insertions(+)
> >
> > diff --git a/Documentation/filesystems/proc.rst
> > b/Documentation/filesystems/proc.rst
> > index e224b6d5b642..115843e8cce3 100644
> > --- a/Documentation/filesystems/proc.rst
> > +++ b/Documentation/filesystems/proc.rst
> > @@ -564,6 +564,7 @@ encoded manner. The codes are the following:
> > mt arm64 MTE allocation tags are enabled
> > um userfaultfd missing tracking
> > uw userfaultfd wr-protect tracking
> > + ss shadow stack page
> > == =======================================
> >
> > Note that there is no guarantee that every flag and associated
> > mnemonic will
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index af1c49ae11b1..9e2cefe47749 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -711,6 +711,9 @@ static void show_smap_vma_flags(struct seq_file
> > *m, struct vm_area_struct *vma)
> > #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
> > [ilog2(VM_UFFD_MINOR)] = "ui",
> > #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
> > +#ifdef CONFIG_X86_USER_SHADOW_STACK
> > + [ilog2(VM_SHADOW_STACK)] = "ss",
> > +#endif
> > };
> > size_t i;
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index e6f1789c8e69..76e0a09aeffe 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -315,11 +315,13 @@ extern unsigned int kobjsize(const void
> > *objp);
> > #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-
> > bit architectures */
> > #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-
> > bit architectures */
> > #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-
> > bit architectures */
> > +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit
> > architectures */
> > #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0)
> > #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1)
> > #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2)
> > #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3)
> > #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
> > +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5)
> > #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
> >
> > #ifdef CONFIG_ARCH_HAS_PKEYS
> > @@ -335,6 +337,12 @@ extern unsigned int kobjsize(const void
> > *objp);
> > #endif
> > #endif /* CONFIG_ARCH_HAS_PKEYS */
> >
> > +#ifdef CONFIG_X86_USER_SHADOW_STACK
>
>
> Should we abstract this to CONFIG_ARCH_USER_SHADOW_STACK, seeing
> that
> other architectures might similarly need it?

There was an ARCH_HAS_SHADOW_STACK but it got removed following this
discussion:

https://lore.kernel.org/lkml/d09e952d8ae696f687f0787dfeb7be7699c02913.camel@xxxxxxxxx/

Now we have this new RFC for riscv as potentially a second
implementation. But it is still very early, and I'm not sure anyone
knows exactly what the similarities will be in a mature version. So I
think it would be better to refactor in an ARCH_HAS_SHADOW_STACK later
(and similar abstractions) once that series is more mature and we have
an idea of what pieces will be shared. I don't have a problem in
principle with an ARCH config, just don't think we should do it yet.

>