[RFC PATCH 00/10] RDMA/FS DAX "LONGTERM" lease proposal

From: ira . weiny
Date: Mon Apr 29 2019 - 00:54:08 EST


From: Ira Weiny <ira.weiny@xxxxxxxxx>

In order to support RDMA to File system pages[*] without On Demand Paging a
number of things need to be done.

1) GUP "longterm"[1] users need to inform the other subsystems that they have
taken a pin on a page which may remain pinned for a very "long time".[1]

2) Any page which is "controlled" by a file system such needs to have special
handling. The details of the handling depends on if the page is page cache
backed or not.

2a) A page cache backed page which has been pinned by GUP Longterm can use a
bounce buffer to allow the file system to write back snap shots of the page.
This is handled by the FS recognizing the GUP longterm pin and making a copy
of the page to be written back.
NOTE: this patch set does not address this path.

2b) A FS "controlled" page which is not page cache backed is either easier
to deal with or harder depending on the operation the filesystem is trying
to do.

2ba) [Hard case] If the FS operation _is_ a truncate or hole punch the
FS can no longer use the pages in question until the pin has been
removed. This patch set presents a solution to this by introducing
some reasonable restrictions on user space applications.

2bb) [Easy case] If the FS operation is _not_ a truncate or hole punch
then there is nothing which need be done. Data is Read or Written
directly to the page. This is an easy case which would currently work
if not for GUP longterm pins being disabled. Therefore this patch set
need not change access to the file data but does allow for GUP pins
after 2ba above is dealt with.


The architecture of this series is to introduce a F_LONGTERM file lease
mechanism which applications use in one of 2 ways.

1) Applications which may require hole punch or truncation operations on files
they intend to mmmapping and pinning for long periods. Can take a
F_LONGTERM lease on the file. When a file system operation needs truncate
access to this file the lease is broken and the application gets a SIGIO.
Upon catching SIGIO the application can un-pin (note munmap is not required)
the memory associated with that file. At that point the truncating user can
proceed. Re-pinning the memory is entirely left up to the application. In
some cases a new mmap will be required (as with a truncation) or a SIGBUS
would be experienced anyway.

Failure to respond to a SIGIO lease break within the system configured
lease-break-time will result in a SIGBUS.

WIP: SIGBUS could be caught and ignored... what danger does this present...
should this be SIGKILL or should we wait another lease-break-time and then
send SIGKILL?

2) Applications which don't require hold punch or truncate operations can use
pinning without taking a F_LONGTERM lease. However, applications such as
this are expected to have considered the access to the files they are
mmaping and are expected to be controlling them in a way that other users on
a system can't truncate a file and cause a DOS on the application. These
applications will be sent a SIGBUS if someone attempts to truncate or hole
punch a file.

ALTERNATIVE WIP patch in series: If the F_LONGTERM lease is not taken
fail the GUP.

The patches compile and have been tested to a first degree.

NOTES:
Can we deal with the lease/pin at the VFS layer? or for all FSs?
LONGTERM seems like a bad name. Suggestions?

[1] The definition of long time is debatable but it has been established
that RDMAs use of pages, minutes or hours after the pin is the extreme case
which makes this problem most severe.

[*] Not all file system pages are Page Cache pages. FS DAX bypasses the page
cache.


Ira Weiny (10):
fs/locks: Add trace_leases_conflict
fs/locks: Introduce FL_LONGTERM file lease
mm/gup: Pass flags down to __gup_device_huge* calls
WIP: mm/gup: Ensure F_LONGTERM lease is held on GUP pages
mm/gup: Take FL_LONGTERM lease if not set by user
fs/locks: Add longterm lease traces
fs/dax: Create function dax_mapping_is_dax()
mm/gup: fs: Send SIGBUS on truncate of active file
fs/locks: Add tracepoint for SIGBUS on LONGTERM expiration
mm/gup: Remove FOLL_LONGTERM DAX exclusion

fs/dax.c | 23 ++-
fs/ext4/inode.c | 4 +
fs/locks.c | 301 +++++++++++++++++++++++++++++--
fs/xfs/xfs_file.c | 4 +
include/linux/dax.h | 6 +
include/linux/fs.h | 18 ++
include/linux/mm.h | 2 +
include/trace/events/filelock.h | 74 +++++++-
include/uapi/asm-generic/fcntl.h | 2 +
mm/gup.c | 107 ++++-------
mm/huge_memory.c | 18 ++
11 files changed, 468 insertions(+), 91 deletions(-)

--
2.20.1