Re: Sealed memfd & no-fault mmap

From: Linus Torvalds
Date: Sat May 29 2021 - 11:44:30 EST


On Fri, May 28, 2021 at 9:31 PM Lin, Ming <minggr@xxxxxxxxx> wrote:
>
> I should check the vma is not writable.
>
> - if (!IS_NOFAULT(inode))
> + if (!IS_NOFAULT(inode) || (vma->vm_flags & VM_WRITE))
> return -EINVAL;

That might be enough, yes.

But if this is sufficient for the compositor needs, and the rule is
that this only works for read-only mappings, then I think the flag in
the inode becomes the wrong thing to do.

Because if it's a read-only mapping, and we only ever care about
inserting zero pages into the page tables - and they never become part
of the shared memory region itself, then it really is purely about
that mmap, not about the shm inode.

So then it really does become purely about one particular mmap, and it
really should be a "madvise()" issue, not a "mark inode as no-fault".

I'd almost be inclined to just add a new "flags" field to the vma.
We've been running out of vma flags for a long time, to the point that
some of them are only available on 64-bit architectures.

I get the feeling that we should just bite the bullet and make
"vm_flags" be an u64. Or possibly make it two explicitly 32-bit
entities (vm_flags and vm_extra). Get rid of the silly 64-bit-only
"high" flags, and get rid of our artificial "we don't have enough
bits".

Because we already in practice *do* have enough bits, we've just
artificially limited ourselves to "on 32-bit architectures we only
have 32 bits in that field".

But all of this is very much dependent on that "this NOFAULT case
really only works for reads, not for writes".

(Alternatively, we could allow the *mapping* itself to be writable,
but always fault on writes, and only insert a read-only zero page)

Linus