[PATCH 3.16 096/114] xfs: lock out page faults from extent swap operations

From: Ben Hutchings
Date: Mon Jun 13 2016 - 14:42:33 EST


3.16.36-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Dave Chinner <dchinner@xxxxxxxxxx>

commit 723cac48473358939759885a18e8df113ea96138 upstream.

Extent swap operations are another extent manipulation operation
that we need to ensure does not race against mmap page faults. The
current code returns if the file is mapped prior to the swap being
done, but it could potentially race against new page faults while
the swap is in progress. Hence we should use the XFS_MMAPLOCK_EXCL
for this operation, too.

While there, fix the error path handling that can result in double
unlocks of the inodes when cancelling the swapext transaction.

Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
[bwh: Backported to 3.16:
- The obsoleted check for mmap'd files was directly in xfs_swap_extents()
and used VN_MAPPED
- Adjust context]
Signed-off-by: Ben Hutchings <ben@xxxxxxxxxxxxxxx>
---
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1642,13 +1642,14 @@ xfs_swap_extents(
}

/*
- * Lock up the inodes against other IO and truncate to begin with.
- * Then we can ensure the inodes are flushed and have no page cache
- * safely. Once we have done this we can take the ilocks and do the rest
- * of the checks.
+ * Lock the inodes against other IO, page faults and truncate to
+ * begin with. Then we can ensure the inodes are flushed and have no
+ * page cache safely. Once we have done this we can take the ilocks and
+ * do the rest of the checks.
*/
- lock_flags = XFS_IOLOCK_EXCL;
+ lock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
xfs_lock_two_inodes(ip, tip, XFS_IOLOCK_EXCL);
+ xfs_lock_two_inodes(ip, tip, XFS_MMAPLOCK_EXCL);

/* Verify that both files have the same format */
if ((ip->i_d.di_mode & S_IFMT) != (tip->i_d.di_mode & S_IFMT)) {
@@ -1711,17 +1712,6 @@ xfs_swap_extents(
goto out_unlock;
}

- /* We need to fail if the file is memory mapped. Once we have tossed
- * all existing pages, the page fault will have no option
- * but to go to the filesystem for pages. By making the page fault call
- * vop_read (or write in the case of autogrow) they block on the iolock
- * until we have switched the extents.
- */
- if (VN_MAPPED(VFS_I(ip))) {
- error = XFS_ERROR(EBUSY);
- goto out_unlock;
- }
-
xfs_iunlock(ip, XFS_ILOCK_EXCL);
xfs_iunlock(tip, XFS_ILOCK_EXCL);
lock_flags &= ~XFS_ILOCK_EXCL;
@@ -1740,8 +1730,15 @@ xfs_swap_extents(
if (error)
goto out_trans_cancel;

+ /*
+ * Lock and join the inodes to the tansaction so that transaction commit
+ * or cancel will unlock the inodes from this point onwards.
+ */
xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL);
lock_flags |= XFS_ILOCK_EXCL;
+ xfs_trans_ijoin(tp, ip, lock_flags);
+ xfs_trans_ijoin(tp, tip, lock_flags);
+

/*
* Count the number of extended attribute blocks
@@ -1760,9 +1757,6 @@ xfs_swap_extents(
goto out_trans_cancel;
}

- xfs_trans_ijoin(tp, ip, lock_flags);
- xfs_trans_ijoin(tp, tip, lock_flags);
-
/*
* Before we've swapped the forks, lets set the owners of the forks
* appropriately. We have to do this as we are demand paging the btree
@@ -1896,5 +1890,5 @@ out_unlock:

out_trans_cancel:
xfs_trans_cancel(tp, 0);
- goto out_unlock;
+ goto out;
}