[PATCH 3.16 095/114] xfs: xfs_setattr_size no longer races with page faults

From: Ben Hutchings
Date: Mon Jun 13 2016 - 15:01:37 EST


3.16.36-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Dave Chinner <dchinner@xxxxxxxxxx>

commit 0f9160b444e4de33b65dfcd3b901358a3129461a upstream.

Now that truncate locks out new page faults, we no longer need to do
special writeback hacks in truncate to work around potential races
between page faults, page cache truncation and file size updates to
ensure we get write page faults for extending truncates on sub-page
block size filesystems. Hence we can remove the code in
xfs_setattr_size() that handles this and update the comments around
the code tha thandles page cache truncate and size updates to
reflect the new reality.

Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
[bwh: Backported to 3.16: we never had the previous hack, so just update the
comment]
Signed-off-by: Ben Hutchings <ben@xxxxxxxxxxxxxxx>
---
fs/xfs/xfs_iops.c | 56 ++++++++++++++-----------------------------------------
1 file changed, 14 insertions(+), 42 deletions(-)

--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -823,19 +823,21 @@ xfs_setattr_size(
inode_dio_wait(inode);

/*
- * Do all the page cache truncate work outside the transaction context
- * as the "lock" order is page lock->log space reservation. i.e.
- * locking pages inside the transaction can ABBA deadlock with
- * writeback. We have to do the VFS inode size update before we truncate
- * the pagecache, however, to avoid racing with page faults beyond the
- * new EOF they are not serialised against truncate operations except by
- * page locks and size updates.
+ * We've already locked out new page faults, so now we can safely remove
+ * pages from the page cache knowing they won't get refaulted until we
+ * drop the XFS_MMAP_EXCL lock after the extent manipulations are
+ * complete. The truncate_setsize() call also cleans partial EOF page
+ * PTEs on extending truncates and hence ensures sub-page block size
+ * filesystems are correctly handled, too.
*
- * Hence we are in a situation where a truncate can fail with ENOMEM
- * from xfs_trans_reserve(), but having already truncated the in-memory
- * version of the file (i.e. made user visible changes). There's not
- * much we can do about this, except to hope that the caller sees ENOMEM
- * and retries the truncate operation.
+ * We have to do all the page cache truncate work outside the
+ * transaction context as the "lock" order is page lock->log space
+ * reservation as defined by extent allocation in the writeback path.
+ * Hence a truncate can fail with ENOMEM from xfs_trans_reserve(), but
+ * having already truncated the in-memory version of the file (i.e. made
+ * user visible changes). There's not much we can do about this, except
+ * to hope that the caller sees ENOMEM and retries the truncate
+ * operation.
*/
error = -block_truncate_page(inode->i_mapping, newsize, xfs_get_blocks);
if (error)