Re: [PATCH v8 2/4] mseal: add mseal syscall

From: Jeff Xu
Date: Thu Feb 01 2024 - 23:22:51 EST


On Thu, Feb 1, 2024 at 8:10 PM Theo de Raadt <deraadt@xxxxxxxxxxx> wrote:
>
> Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
>
> > On Thu, Feb 1, 2024 at 7:54 PM Theo de Raadt <deraadt@xxxxxxxxxxx> wrote:
> > >
> > > Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
> > >
> > > > On Thu, Feb 1, 2024 at 3:11 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, Jan 31, 2024 at 05:50:24PM +0000, jeffxu@xxxxxxxxxxxx wrote:
> > > > > > [PATCH v8 2/4] mseal: add mseal syscall
> > > > > [...]
> > > > > > +/*
> > > > > > + * The PROT_SEAL defines memory sealing in the prot argument of mmap().
> > > > > > + */
> > > > > > +#define PROT_SEAL 0x04000000 /* _BITUL(26) */
> > > > > > +
> > > > > > /* 0x01 - 0x03 are defined in linux/mman.h */
> > > > > > #define MAP_TYPE 0x0f /* Mask for type of mapping */
> > > > > > #define MAP_FIXED 0x10 /* Interpret addr exactly */
> > > > > > @@ -33,6 +38,9 @@
> > > > > > #define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be
> > > > > > * uninitialized */
> > > > > >
> > > > > > +/* map is sealable */
> > > > > > +#define MAP_SEALABLE 0x8000000 /* _BITUL(27) */
> > > > >
> > > > > IMO this patch is misleading, as it claims to just be adding a new syscall, but
> > > > > it actually adds three new UAPIs, only one of which is the new syscall. The
> > > > > other two new UAPIs are new flags to the mmap syscall.
> > > > >
> > > > The description does include all three. I could update the patch title.
> > > >
> > > > > Based on recent discussions, it seems the usefulness of the new mmap flags has
> > > > > not yet been established. Note also that there are only a limited number of
> > > > > mmap flags remaining, so we should be careful about allocating them.
> > > > >
> > > > > Therefore, why not start by just adding the mseal syscall, without the new mmap
> > > > > flags alongside it?
> > > > >
> > > > > I'll also note that the existing PROT_* flags seem to be conventionally used for
> > > > > the CPU page protections, as opposed to kernel-specific properties of the VMA
> > > > > object. As such, PROT_SEAL feels a bit out of place anyway. If it's added at
> > > > > all it perhaps should be a MAP_* flag, not PROT_*. I'm not sure this aspect has
> > > > > been properly discussed yet, seeing as the patchset is presented as just adding
> > > > > sys_mseal(). Some reviewers may not have noticed or considered the new flags.
> > > > >
> > > > MAP_ flags is more used for type of mapping, such as MAP_FIXED_NOREPLACE.
> > > >
> > > > The PROT_SEAL might make more sense because sealing the protection bit
> > > > is the main functionality of the sealing at this moment.
> > >
> > > Jeff, please show a piece of software that needs to do PROT_SEAL as
> > > mprotect() or mmap() argument.
> > >
> > I didn't propose mprotect().
> >
> > for mmap() here is a potential use case:
> >
> > fs/binfmt_elf.c
> > if (current->personality & MMAP_PAGE_ZERO) {
> > /* Why this, you ask??? Well SVr4 maps page 0 as read-only,
> > and some applications "depend" upon this behavior.
> > Since we do not have the power to recompile these, we
> > emulate the SVr4 behavior. Sigh. */
> >
> > error = vm_mmap(NULL, 0, PAGE_SIZE,
> > PROT_READ | PROT_EXEC, <-- add PROT_SEAL
> > MAP_FIXED | MAP_PRIVATE, 0);
> > }
> >
> > I don't see the benefit of RWX page 0, which might make a null
> > pointers error to become executable for some code.
>
>
>
> And this is a lot faster than doing the operation as a second step?
>
>
> But anyways, that's kernel code. It is not userland exposed API used
> by programs.
>
> The question is the damage you create by adding API exposed to
> userland (since this is Linux: forever).
>
> I should be the first person thrilled to see Linux make API/ABI mistakes
> they have to support forever, but I can't be that person.
>
Point taken.
I can remove PROT_SEAL.

>
>