Re: [PATCH v8 2/4] mseal: add mseal syscall

From: Theo de Raadt
Date: Thu Feb 01 2024 - 23:10:49 EST


Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:

> On Thu, Feb 1, 2024 at 7:54 PM Theo de Raadt <deraadt@xxxxxxxxxxx> wrote:
> >
> > Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
> >
> > > On Thu, Feb 1, 2024 at 3:11 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Jan 31, 2024 at 05:50:24PM +0000, jeffxu@xxxxxxxxxxxx wrote:
> > > > > [PATCH v8 2/4] mseal: add mseal syscall
> > > > [...]
> > > > > +/*
> > > > > + * The PROT_SEAL defines memory sealing in the prot argument of mmap().
> > > > > + */
> > > > > +#define PROT_SEAL 0x04000000 /* _BITUL(26) */
> > > > > +
> > > > > /* 0x01 - 0x03 are defined in linux/mman.h */
> > > > > #define MAP_TYPE 0x0f /* Mask for type of mapping */
> > > > > #define MAP_FIXED 0x10 /* Interpret addr exactly */
> > > > > @@ -33,6 +38,9 @@
> > > > > #define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be
> > > > > * uninitialized */
> > > > >
> > > > > +/* map is sealable */
> > > > > +#define MAP_SEALABLE 0x8000000 /* _BITUL(27) */
> > > >
> > > > IMO this patch is misleading, as it claims to just be adding a new syscall, but
> > > > it actually adds three new UAPIs, only one of which is the new syscall. The
> > > > other two new UAPIs are new flags to the mmap syscall.
> > > >
> > > The description does include all three. I could update the patch title.
> > >
> > > > Based on recent discussions, it seems the usefulness of the new mmap flags has
> > > > not yet been established. Note also that there are only a limited number of
> > > > mmap flags remaining, so we should be careful about allocating them.
> > > >
> > > > Therefore, why not start by just adding the mseal syscall, without the new mmap
> > > > flags alongside it?
> > > >
> > > > I'll also note that the existing PROT_* flags seem to be conventionally used for
> > > > the CPU page protections, as opposed to kernel-specific properties of the VMA
> > > > object. As such, PROT_SEAL feels a bit out of place anyway. If it's added at
> > > > all it perhaps should be a MAP_* flag, not PROT_*. I'm not sure this aspect has
> > > > been properly discussed yet, seeing as the patchset is presented as just adding
> > > > sys_mseal(). Some reviewers may not have noticed or considered the new flags.
> > > >
> > > MAP_ flags is more used for type of mapping, such as MAP_FIXED_NOREPLACE.
> > >
> > > The PROT_SEAL might make more sense because sealing the protection bit
> > > is the main functionality of the sealing at this moment.
> >
> > Jeff, please show a piece of software that needs to do PROT_SEAL as
> > mprotect() or mmap() argument.
> >
> I didn't propose mprotect().
>
> for mmap() here is a potential use case:
>
> fs/binfmt_elf.c
> if (current->personality & MMAP_PAGE_ZERO) {
> /* Why this, you ask??? Well SVr4 maps page 0 as read-only,
> and some applications "depend" upon this behavior.
> Since we do not have the power to recompile these, we
> emulate the SVr4 behavior. Sigh. */
>
> error = vm_mmap(NULL, 0, PAGE_SIZE,
> PROT_READ | PROT_EXEC, <-- add PROT_SEAL
> MAP_FIXED | MAP_PRIVATE, 0);
> }
>
> I don't see the benefit of RWX page 0, which might make a null
> pointers error to become executable for some code.



And this is a lot faster than doing the operation as a second step?


But anyways, that's kernel code. It is not userland exposed API used
by programs.

The question is the damage you create by adding API exposed to
userland (since this is Linux: forever).

I should be the first person thrilled to see Linux make API/ABI mistakes
they have to support forever, but I can't be that person.