Re: [PATCH v3 0/7] File Sealing & memfd_create()

From: Hugh Dickins
Date: Tue Jun 17 2014 - 16:32:44 EST


On Tue, 17 Jun 2014, Andy Lutomirski wrote:
> On Tue, Jun 17, 2014 at 9:51 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote:
> > On Tue, Jun 17, 2014 at 6:41 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> >> On Tue, Jun 17, 2014 at 9:36 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote:
> >>> On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> >>>> Can you summarize why holes can't be reliably backed by the zero page?
> >>>
> >>> To answer this, I will quote Hugh from "PATCH v2 1/3":
> >>>
> >>>> We do already use the ZERO_PAGE instead of allocating when it's a
> >>>> simple read; and on the face of it, we could extend that to mmap
> >>>> once the file is sealed. But I am rather afraid to do so - for
> >>>> many years there was an mmap /dev/zero case which did that, but
> >>>> it was an easily forgotten case which caught us out at least
> >>>> once, so I'm reluctant to reintroduce it now for sealing.
> >>>>
> >>>> Anyway, I don't expect you to resolve the issue of sealed holes:
> >>>> that's very much my territory, to give you support on.
> >>>
> >>> Holes can be avoided with a simple fallocate(). I don't understand why
> >>> I should make SEAL_WRITE do the fallocate for the caller. During the
> >>> discussion of memfd_create() I was told to drop the "size" parameter,
> >>> because it is redundant. I don't see how this implicit fallocate()
> >>> does not fall into the same category?
> >>>
> >>
> >> I'm really confused now.
> >>
> >> If I SEAL_WRITE a file, and then I mmap it PROT_READ, and then I read
> >> it, is that a "simple read"? If so, doesn't that mean that there's no
> >> problem?
> >
> > I assumed Hugh was talking about read(). So no, this is not about
> > memory-reads on mmap()ed regions.
> >
> > Looking at shmem_file_read_iter() I can see a ZERO_PAGE(0) call in
> > case shmem_getpage_gfp(SGP_READ) tells us there's a hole. I cannot see
> > anything like that in the mmap_region() and shmem_fault() paths.
>
> Would it be easy to fix this just for SEAL_WRITE files? Hugh?
>
> This would make the interface much nicer, IMO.

I do agree with you, Andy.

I agree with David that a fallocate (of the fill-in-holes variety)
does not have to be prohibited on a sealed file, that detection of
holes is not an issue with respect to sealing, and that fallocate
by the recipient could be used to "post-seal" the object to safety.

But it doesn't feel right, and we shall be re-explaining and apologizing
for it for months to come, until we just fix it. I suspect David didn't
want to add a dependency upon me to fix it, and I didn't want to be
rushed into fixing it (nor is it a job I'd be comfortable to delegate).

I'll give it more thought. The problem is that there may be a variety
of codepaths, in mm/shmem.c but more seriously outside it, which expect
an appropriate page->mapping and page->index on any page of a shared
mapping, and will be buggily surprised to find a ZERO_PAGE instead.
I'll have to go through carefully. Splice may be more difficult to
audit than fault, I don't very often have to think about it.

And though I'd prefer to do the same for non-sealed as for sealed, it
may make more sense in the short term just to address the sealed case,
as you suggest. In the unsealed case, first write to a page entails
locating all the places where the ZERO_PAGE had previously been mapped,
and replacing it there by the newly allocated page; might depend on
VM_NONLINEAR removal, and might entail page_mkwrite(). Doing just
the sealed is easier, though the half-complete job will annoy me.

I did refresh my memory of the /dev/zero case that had particularly
worried me: it was stranger than I'd thought, that reading from
/dev/zero could insert ZERO_PAGEs into mappings of other files.
Nick put an end to that in 2.6.24, but perhaps its prior existence
helps give assurance that ZERO_PAGE in surprising places is less
trouble than I fear (it did force XIP into having its own zero_page,
but I don't remember other complications).

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/