Re: [PATCH] xfs: Remove i_rwsem lock in buffered read

From: Chi Zhiling
Date: Thu Jan 09 2025 - 07:11:31 EST

Next message: Divya Koppera: "[PATCH net] net: phy: Kconfig: Fix error related to undefined reference"
Previous message: Dmitry V. Levin: "Re: [PATCH 2/6] syscall.h: add syscall_set_arguments() on remaining HAVE_ARCH_TRACEHOOK arches"
In reply to: Amir Goldstein: "Re: [PATCH] xfs: Remove i_rwsem lock in buffered read"
Next in thread: John Garry: "Re: [PATCH] xfs: Remove i_rwsem lock in buffered read"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2025/1/9 18:25, Amir Goldstein wrote:

One more thing I should mention.
You do not need to wait for atomic large writes patches to land.
There is nothing stopping you from implementing the suggested
solution based on the xfs code already in master (v6.13-rc1),
which has support for the RWF_ATOMIC flag for writes.

Only I missed the fact that there is not yet a plan to support
atomic buffered writes :-/

I think it's necessary to support atomic buffered writes.

It just means that the API will not be usable for applications that
want to do IO larger than block size, but concurrent read/write

^
To be precise, this is the page size, not the block size, right?

fs block size:

if (iocb->ki_flags & IOCB_ATOMIC) {
/*
* Currently only atomic writing of a single FS block is
* supported. It would be possible to atomic write smaller than
* a FS block, but there is no requirement to support this.
* Note that iomap also does not support this yet.
*/
if (ocount != ip->i_mount->m_sb.sb_blocksize)
return -EINVAL;
ret = generic_atomic_write_valid(iocb, from);
if (ret)
return ret;
}

Uh, okay, maybe I didn't understand it very accurately. I thought
you were talking about buffered IO. I was quite curious as to why
you mentioned block size in the context of buffered IO.

performance of 4K IO could be improved already.

Great, which means that IO operations aligned within a single page
can be executed concurrently, because the folio lock already
provides atomicity guarantees.

If the write does not exceed the boundary of a page, we can
downgrade the iolock to XFS_IOLOCK_SHARED. It seems to be safe
and will not change the current behavior.

--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -454,6 +454,11 @@ xfs_file_write_checks(
if (error)
return error;

+ if ( iocb->ki_pos >> PAGE_SHIFT == (iocb->ki_pos + count) >>
PAGE_SHIFT) {
+ *iolock = XFS_IOLOCK_SHARED;
+ }
+
/*
* For changing security info in file_remove_privs() we need
i_rwsem
* exclusively.

I think that may be possible, but you should do it in the buffered write
code as the patch below.
xfs_file_write_checks() is called from code paths like
xfs_file_dio_write_unaligned() where you should not demote to shared lock.

Wow, thank you for the reminder. This is the prototype of the patch.
I might need to consider more scenarios and conduct testing before
sending the patch.

It's possible that all you need to do is:

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index c488ae26b23d0..2542f15496488 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -777,9 +777,10 @@ xfs_file_buffered_write(
ssize_t ret;
bool cleared_space = false;
unsigned int iolock;
+ bool atomic_write = iocb->ki_flags & IOCB_ATOMIC;

write_retry:
- iolock = XFS_IOLOCK_EXCL;
+ iolock = atomic_write ? XFS_IOLOCK_SHARED : XFS_IOLOCK_EXCL;
ret = xfs_ilock_iocb(iocb, iolock);
--

xfs_file_write_checks() afterwards already takes care of promoting
XFS_IOLOCK_SHARED to XFS_IOLOCK_EXCL for extending writes.

Yeah, for writes that exceed the PAGE boundary, we can also promote
the lock to XFS_IOLOCK_EXCL. Otherwise, I am concerned that it may
lead to old data being retained in the file.

For example, two processes writing four pages of data to the same area.

process A process B
--------------------------------
write AA--
<sleep>
new write BBBB
write --AA

The final data is BBAA.

What is the use case for which you are trying to fix performance?
Is it a use case with single block IO? if not then it does not help to implement
a partial solution for single block size IO.

I want to improve the UnixBench score for XFS. UnixBench uses buffered
IO for testing, including both single-threaded and multi-threaded tests.
Additionally, the IO size tested in UnixBench is below 4K, so this will
be very helpful.

Thanks,
Chi Zhiling

Next message: Divya Koppera: "[PATCH net] net: phy: Kconfig: Fix error related to undefined reference"
Previous message: Dmitry V. Levin: "Re: [PATCH 2/6] syscall.h: add syscall_set_arguments() on remaining HAVE_ARCH_TRACEHOOK arches"
In reply to: Amir Goldstein: "Re: [PATCH] xfs: Remove i_rwsem lock in buffered read"
Next in thread: John Garry: "Re: [PATCH] xfs: Remove i_rwsem lock in buffered read"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]