Re: [PATCH v6 11/13] xfs: add xfs_file_dio_write_atomic()

From: John Garry
Date: Mon Mar 17 2025 - 05:37:00 EST


On 17/03/2025 06:41, Christoph Hellwig wrote:
On Thu, Mar 13, 2025 at 05:13:08PM +0000, John Garry wrote:
+ * REQ_ATOMIC-based is the preferred method, and is attempted first. If this
+ * method fails due to REQ_ATOMIC-related constraints, then we retry with the
+ * COW-based method. The REQ_ATOMIC-based method typically will fail if the
+ * write spans multiple extents or the disk blocks are misaligned.

It is only preferred if actually supported by the underlying hardware.
If it isn't it really shouldn't even be tried, as that is just a waste
of cycles.

We should not even call this function if atomics are not supported by HW - please see IOCB_ATOMIC checks in xfs_file_write_iter(). So maybe I will mention that the caller must ensure atomics are supported for the write size.


Also a lot of comment should probably be near the code not on top
of the function as that's where people would look for them.

sure, if you prefer


+static noinline ssize_t
+xfs_file_dio_write_atomic(
+ struct xfs_inode *ip,
+ struct kiocb *iocb,
+ struct iov_iter *from)
+{
+ unsigned int iolock = XFS_IOLOCK_SHARED;
+ unsigned int dio_flags = 0;
+ const struct iomap_ops *dops = &xfs_direct_write_iomap_ops;
+ ssize_t ret;
+
+retry:
+ ret = xfs_ilock_iocb_for_write(iocb, &iolock);
+ if (ret)
+ return ret;
+
+ ret = xfs_file_write_checks(iocb, from, &iolock, NULL);
+ if (ret)
+ goto out_unlock;
+
+ if (dio_flags & IOMAP_DIO_FORCE_WAIT)
+ inode_dio_wait(VFS_I(ip));
+
+ trace_xfs_file_direct_write(iocb, from);
+ ret = iomap_dio_rw(iocb, from, dops, &xfs_dio_write_ops,
+ dio_flags, NULL, 0);

The normal direct I/O path downgrades the iolock to shared before
doing the I/O here. Why isn't that done here?

OK, I can do that. But we still require exclusive lock always for the CoW-based method.


+ if (ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT) &&
+ dops == &xfs_direct_write_iomap_ops) {

This should probably explain the unusual use of EGAIN. Although I
still feel that picking a different error code for the fallback would
be much more maintainable.

I could try another error code - can you suggest one? Is it going to be something unrelated to storage stack, like EREMOTEIO?


+ xfs_iunlock(ip, iolock);
+ dio_flags = IOMAP_DIO_FORCE_WAIT;

I notice the top of function comment mentions the IOMAP_DIO_FORCE_WAIT
flag. Maybe use the chance to write a full sentence here or where
it is checked to explain the logic a bit better?

ok, fine


* Handle block unaligned direct I/O writes
*
@@ -840,6 +909,10 @@ xfs_file_dio_write(
return xfs_file_dio_write_unaligned(ip, iocb, from);
if (xfs_is_zoned_inode(ip))
return xfs_file_dio_write_zoned(ip, iocb, from);
+
+ if (iocb->ki_flags & IOCB_ATOMIC)
+ return xfs_file_dio_write_atomic(ip, iocb, from);
+

Either keep space between all the conditional calls or none. I doubt
just stick to the existing style.

Sure