Re: [RFCv2 08/16] KVM: Use GUP instead of copy_from/to_user() to access guest memory

From: Matthew Wilcox
Date: Mon Oct 26 2020 - 00:22:41 EST


On Thu, Oct 22, 2020 at 12:58:14PM -0700, John Hubbard wrote:
> On 10/22/20 4:49 AM, Matthew Wilcox wrote:
> > On Tue, Oct 20, 2020 at 01:25:59AM -0700, John Hubbard wrote:
> > > Should copy_to_guest() use pin_user_pages_unlocked() instead of gup_unlocked?
> > > We wrote a "Case 5" in Documentation/core-api/pin_user_pages.rst, just for this
> > > situation, I think:
> > >
> > >
> > > CASE 5: Pinning in order to write to the data within the page
> > > -------------------------------------------------------------
> > > Even though neither DMA nor Direct IO is involved, just a simple case of "pin,
> > > write to a page's data, unpin" can cause a problem. Case 5 may be considered a
> > > superset of Case 1, plus Case 2, plus anything that invokes that pattern. In
> > > other words, if the code is neither Case 1 nor Case 2, it may still require
> > > FOLL_PIN, for patterns like this:
> > >
> > > Correct (uses FOLL_PIN calls):
> > > pin_user_pages()
> > > write to the data within the pages
> > > unpin_user_pages()
> >
> > Case 5 is crap though. That bug should have been fixed by getting
> > the locking right. ie:
> >
> > get_user_pages_fast();
> > lock_page();
> > kmap();
> > set_bit();
> > kunmap();
> > set_page_dirty()
> > unlock_page();
> >
> > I should have vetoed that patch at the time, but I was busy with other things.
>
> It does seem like lock_page() is better, for now at least, because it
> forces the kind of synchronization with file system writeback that is
> still yet to be implemented for pin_user_pages().
>
> Long term though, Case 5 provides an alternative way to do this
> pattern--without using lock_page(). Also, note that Case 5, *in
> general*, need not be done page-at-a-time, unlike the lock_page()
> approach. Therefore, Case 5 might potentially help at some call sites,
> either for deadlock avoidance or performance improvements.
>
> In other words, once the other half of the pin_user_pages() plan is
> implemented, either of these approaches should work.
>
> Or, are you thinking that there is never a situation in which Case 5 is
> valid?

I don't think the page pinning approach is ever valid. For file
mappings, the infiniband registration should take a lease on the inode,
blocking truncation. DAX shouldn't be using struct pages at all, so
there shouldn't be anything there to pin.

It's been five years since DAX was merged, and page pinning still
doesn't work. How much longer before the people who are pushing it
realise that it's fundamentally flawed?