Re: [PATCH v2 2/5] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP
From: Dan Williams
Date: Fri Aug 04 2017 - 15:52:58 EST
On Fri, Aug 4, 2017 at 12:46 PM, Darrick J. Wong
<darrick.wong@xxxxxxxxxx> wrote:
> On Thu, Aug 03, 2017 at 07:28:17PM -0700, Dan Williams wrote:
>> >From falloc.h:
>>
>> FALLOC_FL_SEAL_BLOCK_MAP is used to seal (make immutable) all of the
>> file logical-to-physical extent offset mappings in the file. The
>> purpose is to allow an application to assume that there are no holes
>> or shared extents in the file and that the metadata needed to find
>> all the physical extents of the file is stable and can never be
>> dirtied.
>>
>> For now this patch only permits setting the in-memory state of
>> S_IOMAP_IMMMUTABLE. Support for clearing and persisting the state is
>> saved for later patches.
>>
>> The implementation is careful to not allow the immutable state to change
>> while any process might have any established mappings. It reuses the
>> existing xfs_reflink_unshare() and xfs_alloc_file_space() to unshare
>> extents and fill all holes in the file. It then holds XFS_ILOCK_EXCL
>> while it validates the file is in the proper state and sets
>> S_IOMAP_IMMUTABLE.
>>
>> Cc: Jan Kara <jack@xxxxxxx>
>> Cc: Jeff Moyer <jmoyer@xxxxxxxxxx>
>> Cc: Christoph Hellwig <hch@xxxxxx>
>> Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
>> Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
>> Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx>
>> Suggested-by: "Darrick J. Wong" <darrick.wong@xxxxxxxxxx>
>> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
>> ---
>> fs/open.c | 11 +++++
>> fs/xfs/xfs_bmap_util.c | 101 +++++++++++++++++++++++++++++++++++++++++++
>> fs/xfs/xfs_bmap_util.h | 2 +
>> fs/xfs/xfs_file.c | 14 ++++--
>> include/linux/falloc.h | 3 +
>> include/uapi/linux/falloc.h | 19 ++++++++
>> 6 files changed, 145 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/open.c b/fs/open.c
>> index 7395860d7164..e3aae59785ae 100644
>> --- a/fs/open.c
>> +++ b/fs/open.c
>> @@ -273,6 +273,17 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>> (mode & ~(FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE)))
>> return -EINVAL;
>>
>> + /*
>> + * Seal block map operation should only be used exclusively, and
>> + * with the IMMUTABLE capability.
>> + */
>> + if (mode & FALLOC_FL_SEAL_BLOCK_MAP) {
>> + if (!capable(CAP_LINUX_IMMUTABLE))
>> + return -EPERM;
>> + if (mode & ~FALLOC_FL_SEAL_BLOCK_MAP)
>> + return -EINVAL;
>> + }
>> +
>> if (!(file->f_mode & FMODE_WRITE))
>> return -EBADF;
>>
>> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
>> index fe0f8f7f4bb7..46d8eb9e19fc 100644
>> --- a/fs/xfs/xfs_bmap_util.c
>> +++ b/fs/xfs/xfs_bmap_util.c
>> @@ -1393,6 +1393,107 @@ xfs_zero_file_space(
>>
>> }
>>
>> +/* Return 1 if hole detected, 0 if not, and < 0 if fail to determine */
>> +STATIC int
>> +xfs_file_has_holes(
>> + struct xfs_inode *ip)
>> +{
>> + struct xfs_mount *mp = ip->i_mount;
>> + struct xfs_bmbt_irec *map;
>> + const int map_size = 10; /* constrain memory overhead */
>> + int i, nmaps;
>> + int error = 0;
>> + xfs_fileoff_t lblkno = 0;
>> + xfs_filblks_t maxlblkcnt;
>> +
>> + map = kmem_alloc(map_size * sizeof(*map), KM_SLEEP);
>
> Sleeping with an inode fully locked and (eventually) a running
> transaction? Yikes.
>
> Just allocate one xfs_bmbt_irec on the stack and pass in nmaps=1.
>
> This method might fit better in libxfs/xfs_bmap.c where it'll be
> able to scan the extent list more quickly with the iext helpers.
Ok, I'll take a look.
>
>> +
>> + maxlblkcnt = XFS_B_TO_FSB(mp, i_size_read(VFS_I(ip)));
>> + do {
>> + nmaps = map_size;
>> + error = xfs_bmapi_read(ip, lblkno, maxlblkcnt - lblkno,
>> + map, &nmaps, 0);
>> + if (error)
>> + break;
>> +
>> + ASSERT(nmaps <= map_size);
>> + for (i = 0; i < nmaps; i++) {
>> + lblkno += map[i].br_blockcount;
>> + if (map[i].br_startblock == HOLESTARTBLOCK) {
>
> I think we also need to check for unwritten extents here, because a
> write to an unwritten block requires some zeroing and a mapping metdata
> update.
Will do.
>
>> + error = 1;
>> + break;
>> + }
>> + }
>> + } while (nmaps > 0 && error == 0);
>> +
>> + kmem_free(map);
>> + return error;
>> +}
>> +
>> +int
>> +xfs_seal_file_space(
>> + struct xfs_inode *ip,
>> + xfs_off_t offset,
>> + xfs_off_t len)
>> +{
>> + struct inode *inode = VFS_I(ip);
>> + struct address_space *mapping = inode->i_mapping;
>> + int error;
>> +
>> + ASSERT(xfs_isilocked(ip, XFS_MMAPLOCK_EXCL));
>
> The IOLOCK must be held here too.
Ok, I can add that. I had this here for the mapping_mapped() check.
>
>> +
>> + if (offset)
>> + return -EINVAL;
>> +
>> + error = xfs_reflink_unshare(ip, offset, len);
>> + if (error)
>> + return error;
>> +
>> + error = xfs_alloc_file_space(ip, offset, len,
>> + XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO);
>> + if (error)
>> + return error;
>> +
>> + xfs_ilock(ip, XFS_ILOCK_EXCL);
>> + /*
>> + * Either the size changed after we performed allocation /
>> + * unsharing, or the request was too small to begin with.
>> + */
>> + error = -EINVAL;
>> + if (len < i_size_read(inode))
>> + goto out_unlock;
>> +
>> + /*
>> + * Allow DAX path to assume that the state of S_IOMAP_IMMUTABLE
>> + * will never change while any mapping is established.
>> + */
>> + error = -EBUSY;
>> + if (mapping_mapped(mapping))
>> + goto out_unlock;
>> +
>> + /* Did we race someone attempting to share extents? */
>> + if (xfs_is_reflink_inode(ip))
>> + goto out_unlock;
>> +
>> + /* Did we race a hole punch? */
>> + error = xfs_file_has_holes(ip);
>> + if (error == 1) {
>> + error = -EBUSY;
>> + goto out_unlock;
>> + }
>> +
>> + /* Abort on an error reading the block map */
>> + if (error < 0)
>> + goto out_unlock;
>> +
>> + inode->i_flags |= S_IOMAP_IMMUTABLE;
>> +
>> +out_unlock:
>> + xfs_iunlock(ip, XFS_ILOCK_EXCL);
>> +
>> + return error;
>> +}
>> +
>> /*
>> * @next_fsb will keep track of the extent currently undergoing shift.
>> * @stop_fsb will keep track of the extent at which we have to stop.
>> diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
>> index 0cede1043571..5115a32a2483 100644
>> --- a/fs/xfs/xfs_bmap_util.h
>> +++ b/fs/xfs/xfs_bmap_util.h
>> @@ -60,6 +60,8 @@ int xfs_collapse_file_space(struct xfs_inode *, xfs_off_t offset,
>> xfs_off_t len);
>> int xfs_insert_file_space(struct xfs_inode *, xfs_off_t offset,
>> xfs_off_t len);
>> +int xfs_seal_file_space(struct xfs_inode *, xfs_off_t offset,
>> + xfs_off_t len);
>>
>> /* EOF block manipulation functions */
>> bool xfs_can_free_eofblocks(struct xfs_inode *ip, bool force);
>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
>> index c4893e226fd8..e21121530a90 100644
>> --- a/fs/xfs/xfs_file.c
>> +++ b/fs/xfs/xfs_file.c
>> @@ -739,7 +739,8 @@ xfs_file_write_iter(
>> #define XFS_FALLOC_FL_SUPPORTED \
>> (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \
>> FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | \
>> - FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE)
>> + FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE | \
>> + FALLOC_FL_SEAL_BLOCK_MAP)
>>
>> STATIC long
>> xfs_file_fallocate(
>> @@ -834,9 +835,14 @@ xfs_file_fallocate(
>> error = xfs_reflink_unshare(ip, offset, len);
>> if (error)
>> goto out_unlock;
>> - }
>> - error = xfs_alloc_file_space(ip, offset, len,
>> - XFS_BMAPI_PREALLOC);
>> +
>> + error = xfs_alloc_file_space(ip, offset, len,
>> + XFS_BMAPI_PREALLOC);
>> + } else if (mode & FALLOC_FL_SEAL_BLOCK_MAP) {
>> + error = xfs_seal_file_space(ip, offset, len);
>> + } else
>> + error = xfs_alloc_file_space(ip, offset, len,
>> + XFS_BMAPI_PREALLOC);
>> }
>> if (error)
>> goto out_unlock;
>> diff --git a/include/linux/falloc.h b/include/linux/falloc.h
>> index 7494dc67c66f..48546c6fbec7 100644
>> --- a/include/linux/falloc.h
>> +++ b/include/linux/falloc.h
>> @@ -26,6 +26,7 @@ struct space_resv {
>> FALLOC_FL_COLLAPSE_RANGE | \
>> FALLOC_FL_ZERO_RANGE | \
>> FALLOC_FL_INSERT_RANGE | \
>> - FALLOC_FL_UNSHARE_RANGE)
>> + FALLOC_FL_UNSHARE_RANGE | \
>> + FALLOC_FL_SEAL_BLOCK_MAP)
>>
>> #endif /* _FALLOC_H_ */
>> diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
>> index b075f601919b..39076975bf6f 100644
>> --- a/include/uapi/linux/falloc.h
>> +++ b/include/uapi/linux/falloc.h
>> @@ -76,4 +76,23 @@
>> */
>> #define FALLOC_FL_UNSHARE_RANGE 0x40
>>
>> +/*
>> + * FALLOC_FL_SEAL_BLOCK_MAP is used to seal (make immutable) all of the
>> + * file logical-to-physical extent offset mappings in the file. The
>> + * purpose is to allow an application to assume that there are no holes
>> + * or shared extents in the file and that the metadata needed to find
>> + * all the physical extents of the file is stable and can never be
>> + * dirtied.
>> + *
>> + * The immutable property is in effect for the entire inode, so the
>> + * range for this operation must start at offset 0 and len must be
>> + * greater than or equal to the current size of the file. If greater,
>> + * this operation allocates, unshares, hole fills, and seals in one
>
> 'allocates' is the same as 'hole fills', I think.
>
> This converts unwritten extents to zeroed written extents too, correct?
Yes, I'll add that and also include that in the man page patch that
I'm working on...
>
>> + * atomic step. If len is zero then the immutable state is cleared for
>> + * the inode.
>
> It's cleared if len == 0? I thought that was what FL_UNSEAL is for?
Whoops, stale holdover from v1.
>> + * This flag implies FALLOC_FL_UNSHARE_RANGE and as such cannot be used
>> + * with the punch, zero, collapse, or insert range modes.
>> + */
>> +#define FALLOC_FL_SEAL_BLOCK_MAP 0x080
>> #endif /* _UAPI_FALLOC_H_ */
Thanks Darrick!