Re: [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines

From: Jan Kara
Date: Thu Sep 15 2022 - 04:16:41 EST


On Wed 14-09-22 17:42:40, Al Viro wrote:
> On Wed, Sep 14, 2022 at 04:52:33PM +0200, Jan Kara wrote:
> > > =================================================================================
> > > CASE 5: Pinning in order to write to the data within the page
> > > -------------------------------------------------------------
> > > Even though neither DMA nor Direct IO is involved, just a simple case of "pin,
> > > write to a page's data, unpin" can cause a problem. Case 5 may be considered a
> > > superset of Case 1, plus Case 2, plus anything that invokes that pattern. In
> > > other words, if the code is neither Case 1 nor Case 2, it may still require
> > > FOLL_PIN, for patterns like this:
> > >
> > > Correct (uses FOLL_PIN calls):
> > > pin_user_pages()
> > > write to the data within the pages
> > > unpin_user_pages()
> > >
> > > INCORRECT (uses FOLL_GET calls):
> > > get_user_pages()
> > > write to the data within the pages
> > > put_page()
> > > =================================================================================
> >
> > Yes, that was my point.
>
> The thing is, at which point do we pin those pages? pin_user_pages() works by
> userland address; by the time we get to any of those we have struct page
> references and no idea whether they are still mapped anywhere.

Yes, pin_user_pages() currently works by page address but there's nothing
fundamental about that. Technically, pin is currently just another type of
page reference so we can as well just pin the page when given struct page.
In fact John Hubbart has added such helper in this series.

> How would that work? What protects the area where you want to avoid running
> into pinned pages from previously acceptable page getting pinned? If "they
> must have been successfully unmapped" is a part of what you are planning, we
> really do have a problem...

But this is a very good question. So far the idea was that we lock the
page, unmap (or writeprotect) the page, and then check pincount == 0 and
that is a reliable method for making sure page data is stable (until we
unlock the page & release other locks blocking page faults and writes). But
once suddently ordinary page references can be used to create pins this
does not work anymore. Hrm.

Just brainstorming ideas now: So we'd either need to obtain the pins early
when we still have the virtual address (but I guess that is often not
practical but should work e.g. for normal direct IO path) or we need some
way to "simulate" the page fault when pinning the page, just don't map it
into page tables in the end. This simulated page fault could be perhaps
avoided if rmap walk shows that the page is already mapped somewhere with
suitable permissions.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR