Re: [PATCH] xfs: Remove i_rwsem lock in buffered read
From: Amir Goldstein
Date: Thu Jan 09 2025 - 05:29:42 EST
> > One more thing I should mention.
> > You do not need to wait for atomic large writes patches to land.
> > There is nothing stopping you from implementing the suggested
> > solution based on the xfs code already in master (v6.13-rc1),
> > which has support for the RWF_ATOMIC flag for writes.
Only I missed the fact that there is not yet a plan to support
atomic buffered writes :-/
> >
> > It just means that the API will not be usable for applications that
> > want to do IO larger than block size, but concurrent read/write
> ^
> To be precise, this is the page size, not the block size, right?
>
fs block size:
if (iocb->ki_flags & IOCB_ATOMIC) {
/*
* Currently only atomic writing of a single FS block is
* supported. It would be possible to atomic write smaller than
* a FS block, but there is no requirement to support this.
* Note that iomap also does not support this yet.
*/
if (ocount != ip->i_mount->m_sb.sb_blocksize)
return -EINVAL;
ret = generic_atomic_write_valid(iocb, from);
if (ret)
return ret;
}
> > performance of 4K IO could be improved already.
>
> Great, which means that IO operations aligned within a single page
> can be executed concurrently, because the folio lock already
> provides atomicity guarantees.
>
> If the write does not exceed the boundary of a page, we can
> downgrade the iolock to XFS_IOLOCK_SHARED. It seems to be safe
> and will not change the current behavior.
>
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -454,6 +454,11 @@ xfs_file_write_checks(
> if (error)
> return error;
>
> + if ( iocb->ki_pos >> PAGE_SHIFT == (iocb->ki_pos + count) >>
> PAGE_SHIFT) {
> + *iolock = XFS_IOLOCK_SHARED;
> + }
> +
> /*
> * For changing security info in file_remove_privs() we need
> i_rwsem
> * exclusively.
>
I think that may be possible, but you should do it in the buffered write
code as the patch below.
xfs_file_write_checks() is called from code paths like
xfs_file_dio_write_unaligned() where you should not demote to shared lock.
> >
> > It's possible that all you need to do is:
> >
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index c488ae26b23d0..2542f15496488 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -777,9 +777,10 @@ xfs_file_buffered_write(
> > ssize_t ret;
> > bool cleared_space = false;
> > unsigned int iolock;
> > + bool atomic_write = iocb->ki_flags & IOCB_ATOMIC;
> >
> > write_retry:
> > - iolock = XFS_IOLOCK_EXCL;
> > + iolock = atomic_write ? XFS_IOLOCK_SHARED : XFS_IOLOCK_EXCL;
> > ret = xfs_ilock_iocb(iocb, iolock);
> > --
> >
> > xfs_file_write_checks() afterwards already takes care of promoting
> > XFS_IOLOCK_SHARED to XFS_IOLOCK_EXCL for extending writes.
>
> Yeah, for writes that exceed the PAGE boundary, we can also promote
> the lock to XFS_IOLOCK_EXCL. Otherwise, I am concerned that it may
> lead to old data being retained in the file.
>
> For example, two processes writing four pages of data to the same area.
>
> process A process B
> --------------------------------
> write AA--
> <sleep>
> new write BBBB
> write --AA
>
> The final data is BBAA.
>
What is the use case for which you are trying to fix performance?
Is it a use case with single block IO? if not then it does not help to implement
a partial solution for single block size IO.
Thanks,
Amir.