Re: [PATCH RFC 4/4] mm: guest_memfd: Add ability for mmap'ing pages
From: Fuad Tabba
Date: Fri Aug 16 2024 - 07:20:10 EST
On Fri, 16 Aug 2024 at 10:48, David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 15.08.24 09:24, Fuad Tabba wrote:
> > Hi David,
>
> Hi!
>
> >
> > On Tue, 6 Aug 2024 at 14:51, David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >>>
> >>> - if (gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP) {
> >>> + if (!ops->accessible && (gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)) {
> >>> r = guest_memfd_folio_private(folio);
> >>> if (r)
> >>> goto out_err;
> >>> @@ -107,6 +109,82 @@ struct folio *guest_memfd_grab_folio(struct file *file, pgoff_t index, u32 flags
> >>> }
> >>> EXPORT_SYMBOL_GPL(guest_memfd_grab_folio);
> >>>
> >>> +int guest_memfd_make_inaccessible(struct file *file, struct folio *folio)
> >>> +{
> >>> + unsigned long gmem_flags = (unsigned long)file->private_data;
> >>> + unsigned long i;
> >>> + int r;
> >>> +
> >>> + unmap_mapping_folio(folio);
> >>> +
> >>> + /**
> >>> + * We can't use the refcount. It might be elevated due to
> >>> + * guest/vcpu trying to access same folio as another vcpu
> >>> + * or because userspace is trying to access folio for same reason
> >>
> >> As discussed, that's insufficient. We really have to drive the refcount
> >> to 1 -- the single reference we expect.
> >>
> >> What is the exact problem you are running into here? Who can just grab a
> >> reference and maybe do nasty things with it?
> >
> > I was wondering, why do we need to check the refcount? Isn't it enough
> > to check for page_mapped() || page_maybe_dma_pinned(), while holding
> > the folio lock?
>
> (folio_mapped() + folio_maybe_dma_pinned())
>
> Not everything goes trough FOLL_PIN. vmsplice() is an example, or just
> some very simple read/write through /proc/pid/mem. Further, some
> O_DIRECT implementations still don't use FOLL_PIN.
>
> So if you see an additional folio reference, as soon as you mapped that
> thing to user space, you have to assume that it could be someone
> reading/writing that memory in possibly sane context. (vmsplice() should
> be using FOLL_PIN|FOLL_LONGTERM, but that's a longer discussion)
>
> (noting that also folio_maybe_dma_pinned() can have false positives in
> some cases due to speculative references or *many* references).
Thanks for the clarification!
/fuad
> --
> Cheers,
>
> David / dhildenb
>