Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

From: Matthew Wilcox
Date: Wed Feb 06 2019 - 12:52:39 EST


On Wed, Feb 06, 2019 at 10:31:14AM -0700, Jason Gunthorpe wrote:
> On Wed, Feb 06, 2019 at 10:50:00AM +0100, Jan Kara wrote:
>
> > MM/FS asks for lease to be revoked. The revoke handler agrees with the
> > other side on cancelling RDMA or whatever and drops the page pins.
>
> This takes a trip through userspace since the communication protocol
> is entirely managed in userspace.
>
> Most existing communication protocols don't have a 'cancel operation'.
>
> > Now I understand there can be HW / communication failures etc. in
> > which case the driver could either block waiting or make sure future
> > IO will fail and drop the pins.
>
> We can always rip things away from the userspace.. However..
>
> > But under normal conditions there should be a way to revoke the
> > access. And if the HW/driver cannot support this, then don't let it
> > anywhere near DAX filesystem.
>
> I think the general observation is that people who want to do DAX &
> RDMA want it to actually work, without data corruption, random process
> kills or random communication failures.
>
> Really, few users would actually want to run in a system where revoke
> can be triggered.
>
> So.. how can the FS/MM side provide a guarantee to the user that
> revoke won't happen under a certain system design?

Most of the cases we want revoke for are things like truncate().
Shouldn't happen with a sane system, but we're trying to avoid users
doing awful things like being able to DMA to pages that are now part of
a different file.