One more thing I should mention.
You do not need to wait for atomic large writes patches to land.
There is nothing stopping you from implementing the suggested
solution based on the xfs code already in master (v6.13-rc1),
which has support for the RWF_ATOMIC flag for writes.
Only I missed the fact that there is not yet a plan to support
atomic buffered writes :-/
^
It just means that the API will not be usable for applications that
want to do IO larger than block size, but concurrent read/write
To be precise, this is the page size, not the block size, right?
fs block size:
if (iocb->ki_flags & IOCB_ATOMIC) {
/*
* Currently only atomic writing of a single FS block is
* supported. It would be possible to atomic write smaller than
* a FS block, but there is no requirement to support this.
* Note that iomap also does not support this yet.
*/
if (ocount != ip->i_mount->m_sb.sb_blocksize)
return -EINVAL;
ret = generic_atomic_write_valid(iocb, from);
if (ret)
return ret;
}
performance of 4K IO could be improved already.
Great, which means that IO operations aligned within a single page
can be executed concurrently, because the folio lock already
provides atomicity guarantees.
If the write does not exceed the boundary of a page, we can
downgrade the iolock to XFS_IOLOCK_SHARED. It seems to be safe
and will not change the current behavior.
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -454,6 +454,11 @@ xfs_file_write_checks(
if (error)
return error;
+ if ( iocb->ki_pos >> PAGE_SHIFT == (iocb->ki_pos + count) >>
PAGE_SHIFT) {
+ *iolock = XFS_IOLOCK_SHARED;
+ }
+
/*
* For changing security info in file_remove_privs() we need
i_rwsem
* exclusively.
I think that may be possible, but you should do it in the buffered write
code as the patch below.
xfs_file_write_checks() is called from code paths like
xfs_file_dio_write_unaligned() where you should not demote to shared lock.
It's possible that all you need to do is:
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index c488ae26b23d0..2542f15496488 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -777,9 +777,10 @@ xfs_file_buffered_write(
ssize_t ret;
bool cleared_space = false;
unsigned int iolock;
+ bool atomic_write = iocb->ki_flags & IOCB_ATOMIC;
write_retry:
- iolock = XFS_IOLOCK_EXCL;
+ iolock = atomic_write ? XFS_IOLOCK_SHARED : XFS_IOLOCK_EXCL;
ret = xfs_ilock_iocb(iocb, iolock);
--
xfs_file_write_checks() afterwards already takes care of promoting
XFS_IOLOCK_SHARED to XFS_IOLOCK_EXCL for extending writes.
Yeah, for writes that exceed the PAGE boundary, we can also promote
the lock to XFS_IOLOCK_EXCL. Otherwise, I am concerned that it may
lead to old data being retained in the file.
For example, two processes writing four pages of data to the same area.
process A process B
--------------------------------
write AA--
<sleep>
new write BBBB
write --AA
The final data is BBAA.
What is the use case for which you are trying to fix performance?
Is it a use case with single block IO? if not then it does not help to implement
a partial solution for single block size IO.