Re: Sealed memfd & no-fault mmap

From: Kirill A. Shutemov
Date: Thu Apr 29 2021 - 11:48:17 EST


On Tue, Apr 27, 2021 at 09:51:58AM -0700, Linus Torvalds wrote:
> On Tue, Apr 27, 2021 at 1:25 AM Simon Ser <contact@xxxxxxxxxxx> wrote:
> >
> > Rather than requiring changes in all compositors *and* clients, can we
> > maybe only require changes in compositors? For instance, OpenBSD has a
> > __MAP_NOFAULT flag. When passed to mmap, it means that out-of-bound
> > accesses will read as zeroes instead of triggering SIGBUS. Such a flag
> > would be very helpful to unblock the annoying SIGBUS situation.
> >
> > Would something among these lines be welcome in the Linux kernel?
>
> Hmm. It doesn't look too hard to do. The biggest problem is actually
> that we've run out of flags in the vma (on 32-bit architectures), but
> you could try this UNTESTED patch that just does the MAP_NOFAULT thing
> unconditionally.
>
> NOTE! Not only is it untested, not only is this a "for your testing
> only" (because it does it unconditionally rather than only for
> __MAP_NOFAULT), but it might be bogus for other reasons. In
> particular, this patch depends on "vmf->address" not being changed by
> the ->fault() infrastructure, so that we can just re-use the vmf for
> the anonymous case if we get a SIGBUS.
>
> I think that's all ok these days, because Kirill and Peter Xu cleaned
> up those paths, but I didn't actually check. So I'm cc'ing Kirill,
> Peter and Will, who have been working in this area for other reasons
> fairly recently.
>
> Side note: this will only ever work for non-shared mappings.

I think it's show-stopper for the use-case, no? IIUC, the mappings is used
for communication between a compositor and a client and has to be shared.

> That's fundamental. We won't add an anonymous page to a shared mapping,
> and do_anonymous_page() does verify that. So a MAP_SHARED mappign will
> still return SIGBUS even with this patch (although it's not obvious from
> the patch - the VM_FAULT_SIGBUS will just be re-created by
> do_anonymous_page()).
>
> So if you want a _shared_ mapping to honor __MAP_NOFAULT and insert
> random anonymous pages into it, I think the answer is "no, that's not
> going to be viable".

+ Matthew, Dan.

DAX uses zero pages in page cache to avoid allocating backing storage read
accesses to holes. Maybe we can generalize it beyond DAX to any page cache
and add a (per-inode?) flag to do the same for accesses beyond i_size?

> So _if_ this works for you, and if it's ok that only MAP_PRIVATE can
> have __MAP_NOFAULT, and if Kirill/Peter/Will don't say "Oh, Linus,
> you're completely off your rocker and clearly need to be taking your
> meds", something like this - if we figure out the conditional bit -
> might be doable.
>
> That's a fair number of "ifs".
>
> Ok, back to the merge window for me, I'll be throwing away this crazy
> untested patch immediately after hitting "send". This is very much a
> "throw the idea over to other people" patch, in other words.
>
> Linus



--
Kirill A. Shutemov