Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

From: Jerome Glisse
Date: Tue Mar 12 2019 - 11:35:37 EST

On Tue, Mar 12, 2019 at 04:52:07AM +0000, Christopher Lameter wrote:
> On Fri, 8 Mar 2019, Jerome Glisse wrote:
> > >
> > > It would good if that understanding would be enforced somehow given the problems
> > > that we see.
> >
> > This has been discuss extensively already. GUP usage is now widespread in
> > multiple drivers, removing that would regress userspace ie break existing
> > application. We all know what the rules for that is.
> The applications that work are using anonymous memory and memory
> filesystems. I have never seen use cases with a real filesystem and would
> have objected if someone tried something crazy like that.
> Because someone was able to get away with weird ways of abusing the system
> it not an argument that we should continue to allow such things. In fact
> we have repeatedly ensured that the kernel works reliably by improving the
> kernel so that a proper failure is occurring.

Driver doing GUP on mmap of regular file is something that seems to
already have widespread user (in the RDMA devices at least). So they
are active users and they were never told that what they are doing
was illegal.

Note that i am personaly fine with breaking device driver that can not
abide by mmu notifier but the consensus seems that it is not fine to
do so.

> > > > In fact, the GUP documentation even recommends that pattern.
> > >
> > > Isnt that pattern safe for anonymous memory and memory filesystems like
> > > hugetlbfs etc? Which is the common use case.
> >
> > Still an issue in respect to swapout ie if anon/shmem page was map
> > read only in preparation for swapout and we do not report the page
> > as dirty what endup in swap might lack what was written last through
> > GUP.
> Well swapout cannot occur if the page is pinned and those pages are also
> often mlocked.

I would need to check the swapout code but i believe the write to disk
can happen before the pin checks happens. I believe the event flow is:
map read only, allocate swap, write to disk, try to free page which
checks for pin. So that you could write stale data to disk and the GUP
going away before you perform the pin checks.

They are other thing to take into account and that need proper page
dirtying, like soft dirtyness for instance.

> > >
> > > Yes you now have the filesystem as well as the GUP pinner claiming
> > > authority over the contents of a single memory segment. Maybe better not
> > > allow that?
> >
> > This goes back to regressing existing driver with existing users.
> There is no regression if that behavior never really worked.

Well RDMA driver maintainer seems to report that this has been a valid
and working workload for their users.

> > > Two filesystem trying to sync one memory segment both believing to have
> > > exclusive access and we want to sort this out. Why? Dont allow this.
> >
> > This is allowed, it always was, forbidding that case now would regress
> > existing application and it would also means that we are modifying the
> > API we expose to userspace. So again this is not something we can block
> > without regressing existing user.
> We have always stopped the user from doing obviously stupid and risky
> things. It would be logical to do it here as well.

While i would rather only allow device that can handle mmu notifier
it is just not acceptable to regress existing user and they do seem
to exist and had working setup going on for a while.