Re: [PATCH -next v5 7/8] xfs: speed up truncating down a big realtime inode

From: Zhang Yi
Date: Fri Jun 14 2024 - 03:18:33 EST


On 2024/6/14 14:08, Christoph Hellwig wrote:
> On Thu, Jun 13, 2024 at 05:00:32PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
>>
>> If we truncate down a big realtime inode, zero out the entire aligned
>> EOF extent could gets slow down as the rtextsize increases. Fortunately,
>> __xfs_bunmapi() would align the unmapped range to rtextsize, split and
>> convert the blocks beyond EOF to unwritten. So speed up this by
>> adjusting the unitsize to the filesystem blocksize when truncating down
>> a large realtime inode, let __xfs_bunmapi() convert the tail blocks to
>> unwritten, this could improve the performance significantly.
>>
>> # mkfs.xfs -f -rrtdev=/dev/pmem1s -f -m reflink=0,rmapbt=0, \
>> -d rtinherit=1 -r extsize=$rtextsize /dev/pmem2s
>> # mount -ortdev=/dev/pmem1s /dev/pmem2s /mnt/scratch
>> # for i in {1..1000}; \
>> do dd if=/dev/zero of=/mnt/scratch/$i bs=$rtextsize count=1024; done
>> # sync
>> # time for i in {1..1000}; \
>> do xfs_io -c "truncate 4k" /mnt/scratch/$i; done
>>
>> rtextsize 8k 16k 32k 64k 256k 1024k
>> before: 9.601s 10.229s 11.153s 12.086s 12.259s 20.141s
>> after: 9.710s 9.642s 9.958s 9.441s 10.021s 10.526s
>>
>> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
>> ---
>> fs/xfs/xfs_inode.c | 10 ++++++++--
>> fs/xfs/xfs_iops.c | 9 +++++++++
>> 2 files changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> index 92daa2279053..5e837ed093b0 100644
>> --- a/fs/xfs/xfs_inode.c
>> +++ b/fs/xfs/xfs_inode.c
>> @@ -1487,6 +1487,7 @@ xfs_itruncate_extents_flags(
>> struct xfs_trans *tp = *tpp;
>> xfs_fileoff_t first_unmap_block;
>> int error = 0;
>> + unsigned int unitsize = xfs_inode_alloc_unitsize(ip);
>>
>> xfs_assert_ilocked(ip, XFS_ILOCK_EXCL);
>> if (atomic_read(&VFS_I(ip)->i_count))
>> @@ -1510,9 +1511,14 @@ xfs_itruncate_extents_flags(
>> *
>> * We have to free all the blocks to the bmbt maximum offset, even if
>> * the page cache can't scale that far.
>> + *
>> + * For big realtime inode, don't aligned to allocation unitsize,
>> + * it'll split the extent and convert the tail blocks to unwritten.
>> */
>> + if (xfs_inode_has_bigrtalloc(ip))
>> + unitsize = i_blocksize(VFS_I(ip));
>> + first_unmap_block = XFS_B_TO_FSB(mp, roundup_64(new_size, unitsize));
>
> If you expand what xfs_inode_alloc_unitsize and xfs_inode_has_bigrtalloc
> this is looking a bit silly:
>
> unsigned int blocks = 1;
>
> if (XFS_IS_REALTIME_INODE(ip))
> blocks = ip->i_mount->m_sb.sb_rextsize;
>
> unitsize = XFS_FSB_TO_B(ip->i_mount, blocks);
> if (XFS_IS_REALTIME_INODE(ip) && ip->i_mount->m_sb.sb_rextsize > 1)
> unsitsize = i_blocksize(inode);
>
> So I think we can simply drop this part now that the variant that zeroes
> the entire rtextent is gone.
>
Thanks for your suggestion.

Yeah, we could fix the realtime inode problem by just drop this part, but
for the upcoming forcealign feature and atomic feature by John, IIUC, we
couldn't split and convert the tail extent like RT inode does, we should
zero out the entire tail force aligned extent, if not, atomic write could
be broken by submitting unaligned bios.

Jone had already expand the xfs_inode_alloc_unitsize() [1], so I think
we should keep this part for forcealign feature and deal with realtime
inode separately, is that right?

[1]
https://lore.kernel.org/linux-xfs/20240607143919.2622319-1-john.g.garry@xxxxxxxxxx/T/#m73ccaa7b6fec9988f24b881e013fc367429405d6
https://lore.kernel.org/linux-xfs/20240607143919.2622319-1-john.g.garry@xxxxxxxxxx/T/#m1a6312428e4addc4d0d260fbf33ad7bcffb98e0d

Thanks,
Yi.

>> @@ -862,6 +862,15 @@ xfs_setattr_truncate_data(
>> /* Truncate down */
>> blocksize = xfs_inode_alloc_unitsize(ip);
>>
>> + /*
>> + * If it's a big realtime inode, zero out the entire EOF extent could
>> + * get slow down as the rtextsize increases, speed it up by adjusting
>> + * the blocksize to the filesystem blocksize, let __xfs_bunmapi() to
>> + * split the extent and convert the tail blocks to unwritten.
>> + */
>> + if (xfs_inode_has_bigrtalloc(ip))
>> + blocksize = i_blocksize(inode);
>
> Same here. And with that probably also the passing of the block size
> to the truncate_page helpers.
>