Re: [PATCH 4/6] fs: xfs: Support atomic write for statx

From: John Garry
Date: Fri Feb 09 2024 - 12:32:18 EST

Next message: Conor Dooley: "Re: [PATCH V2] dt-bindings: media: convert Mediatek consumer IR to the json-schema"
Previous message: Borislav Petkov: "Re: [PATCH 2/2] x86/random: Issue a warning if RDRAND or RDSEED fails"
In reply to: Ojaswin Mujoo: "Re: [PATCH 4/6] fs: xfs: Support atomic write for statx"
Next in thread: Ojaswin Mujoo: "Re: [PATCH 4/6] fs: xfs: Support atomic write for statx"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

+void xfs_get_atomic_write_attr(
+ struct xfs_inode *ip,
+ unsigned int *unit_min,
+ unsigned int *unit_max)
+{
+ xfs_extlen_t extsz = xfs_get_extsz(ip);
+ struct xfs_buftarg *target = xfs_inode_buftarg(ip);
+ struct block_device *bdev = target->bt_bdev;
+ unsigned int awu_min, awu_max, align;
+ struct request_queue *q = bdev->bd_queue;
+ struct xfs_mount *mp = ip->i_mount;
+
+ /*
+ * Convert to multiples of the BLOCKSIZE (as we support a minimum
+ * atomic write unit of BLOCKSIZE).
+ */
+ awu_min = queue_atomic_write_unit_min_bytes(q);
+ awu_max = queue_atomic_write_unit_max_bytes(q);
+
+ awu_min &= ~mp->m_blockmask;
+ awu_max &= ~mp->m_blockmask;

I don't understand why we try to round down the awu_max to blocks size
here and not just have an explicit check of (awu_max < blocksize).

We have later check for !awu_max:

if (!awu_max || !xfs_inode_atomicwrites(ip) || !align ||
..

So what we are doing is ensuring that the awu_max which the device reports is at least FS block size. If it is not, then we cannot support atomic writes.

Indeed, my NVMe drive reports awu_max = 4K. So for XFS configured for 64K block size, we will say that we don't support atomic writes.

I think the issue with changing the awu_max is that we are using awu_max
to also indirectly reflect the alignment so as to ensure we don't cross
atomic boundaries set by the hw (eg we check uint_max % atomic alignment
== 0 in scsi). So once we change the awu_max, there's a chance that even
if an atomic write aligns to the new awu_max it still doesn't have the
right alignment and fails.

All these values should be powers-of-2, so rounding down should not affect whether we would not cross an atomic write boundary.

It works right now since eveything is power of 2 but it should cause
issues incase we decide to remove that limitation.

Sure, but that is a fundamental principle of this atomic write support. Not having powers-of-2 requirement for atomic writes will affect many things.

Anyways, I think
this implicit behavior of things working since eveything is a power of 2
should atleast be documented in a comment, so these things are
immediately clear.

+
+ align = XFS_FSB_TO_B(mp, extsz);
+
+ if (!awu_max || !xfs_inode_atomicwrites(ip) || !align ||
+ !is_power_of_2(align)) {

Correct me if I'm wrong but here as well, the is_power_of_2(align) is
esentially checking if the align % uinit_max == 0 (or vice versa if
unit_max is greater)

yes

so that an allocation of extsize will always align
nicely as needed by the device.

I'm trying to keep things simple now.

In theory we could allow, say, align == 12 FSB, and then could say awu_max = 4.

The same goes for atomic write boundary in NVMe. Currently we say that it needs to be a power-of-2. However, it really just needs to be a multiple of awu_max. So if some HW did report a !power-of-2 atomic write boundary, we could reduce awu_max reported until to fits the power-of-2 rule and also is cleanly divisible into atomic write boundary. But that is just not what HW will report (I expect). We live in a power-of-2 data granularity world.

So maybe we should use the % expression explicitly so that the intention
is immediately clear.

As mentioned, I wanted to keep it simple. In addition, it's a bit of a mess for the FS block allocator to work with odd sizes, like 12. And it does not suit RAID stripe alignment, which works in powers-of-2.

+ *unit_min = 0;
+ *unit_max = 0;
+ } else {
+ if (awu_min)
+ *unit_min = min(awu_min, align);

How will the min() here work? If awu_min is the minumum set by the
device, how can statx be allowed to advertise something smaller than
that?

The idea is that is that if the awu_min reported by the device is less than the FS block size, then we report awu_min = FS block size. We already know that awu_max => FS block size, since we got this far, so saying that awu_min = FS block size is ok.

Otherwise it is the minimum of alignment and awu_min. I suppose that does not make much sense, and we should just always require awu_min = FS block size.

If I understand correctly, right now the way we set awu_min in scsi and
nvme, the follwoing should usually be true for a sane device:

awu_min <= blocks size of fs <= align

so the min() anyways becomes redundant, but if we do assume that there
might be some weird devices with awu_min absurdly large (SCSI with
high atomic granularity) we still can't actually advertise a min
smaller than that of the device, or am I missing something here?

As above, I might just ensure that we can do awu_min = FS block size and not deal with crazy devices.

Thanks,
John

Next message: Conor Dooley: "Re: [PATCH V2] dt-bindings: media: convert Mediatek consumer IR to the json-schema"
Previous message: Borislav Petkov: "Re: [PATCH 2/2] x86/random: Issue a warning if RDRAND or RDSEED fails"
In reply to: Ojaswin Mujoo: "Re: [PATCH 4/6] fs: xfs: Support atomic write for statx"
Next in thread: Ojaswin Mujoo: "Re: [PATCH 4/6] fs: xfs: Support atomic write for statx"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]