Re: [PATCH 15/21] fs: xfs: Support atomic write for statx

From: John Garry
Date: Tue Oct 03 2023 - 06:57:42 EST


On 03/10/2023 04:32, Dave Chinner wrote:
On Fri, Sep 29, 2023 at 10:27:20AM +0000, John Garry wrote:
Support providing info on atomic write unit min and max for an inode.

For simplicity, currently we limit the min at the FS block size, but a
lower limit could be supported in future.

The atomic write unit min and max is limited by the guaranteed extent
alignment for the inode.

Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
---
fs/xfs/xfs_iops.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_iops.h | 4 ++++
2 files changed, 55 insertions(+)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1c1e6171209d..5bff80748223 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -546,6 +546,46 @@ xfs_stat_blksize(
return PAGE_SIZE;
}
+void xfs_ip_atomic_write_attr(struct xfs_inode *ip,
+ xfs_filblks_t *unit_min_fsb,
+ xfs_filblks_t *unit_max_fsb)

Formatting.

Change args to 1x tab indent, right?


Also, we don't use variable name shorthand for function names -
xfs_get_atomic_write_hint(ip) to match xfs_get_extsz_hint(ip)
would be appropriate, right?

Changing the name format would be ok. However we are not returning a hint, but rather the inode atomic write unit min and max values in FS blocks. Anyway, I'll look to rework the name.




+{
+ xfs_extlen_t extsz_hint = xfs_get_extsz_hint(ip);
+ struct xfs_buftarg *target = xfs_inode_buftarg(ip);
+ struct block_device *bdev = target->bt_bdev;
+ struct xfs_mount *mp = ip->i_mount;
+ xfs_filblks_t atomic_write_unit_min,
+ atomic_write_unit_max,
+ align;
+
+ atomic_write_unit_min = XFS_B_TO_FSB(mp,
+ queue_atomic_write_unit_min_bytes(bdev->bd_queue));
+ atomic_write_unit_max = XFS_B_TO_FSB(mp,
+ queue_atomic_write_unit_max_bytes(bdev->bd_queue));

These should be set in the buftarg at mount time, like we do with
sector size masks. Then we don't need to convert them to fsbs on
every single lookup.

ok, fine. However I do still have a doubt on whether these values should be changeable - please see (small) comment about atomic_write_max_sectors in patch 7/21


+ /* for RT, unset extsize gives hint of 1 */
+ /* for !RT, unset extsize gives hint of 0 */
+ if (extsz_hint && (XFS_IS_REALTIME_INODE(ip) ||
+ (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)))

Logic is non-obvious. The compound is (rt || force), not
(extsz && rt), so it took me a while to actually realise I read this
incorrectly.

if (extsz_hint &&
(XFS_IS_REALTIME_INODE(ip) ||
(ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN))) {

+ align = extsz_hint;
+ else
+ align = 1;

And now the logic looks wrong to me. We don't want to use extsz hint
for RT inodes if force align is not set, this will always use it
regardless of the fact it has nothing to do with force alignment.

extsz_hint comes from xfs_get_extsz_hint(), which gives us the SB extsize for the RT inode and this alignment is guaranteed, no?


Indeed, if XFS_DIFLAG2_FORCEALIGN is not set, then shouldn't this
always return min/max = 0 because atomic alignments are not in us on
this inode?

As above, for RT I thought that extsize alignment was guaranteed and we don't need to bother with XFS_DIFLAG2_FORCEALIGN there.


i.e. the first thing this code should do is:

*unit_min_fsb = 0;
*unit_max_fsb = 0;
if (!(ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN))
return;

Then we can check device support:

if (!buftarg->bt_atomic_write_max)
return;

Then we can check for extent size hints. If that's not set:

align = xfs_get_extsz_hint(ip);
if (align <= 1) {
unit_min_fsb = 1;
unit_max_fsb = 1;
return;
}

And finally, if there is an extent size hint, we can return that.

+ if (atomic_write_unit_max == 0) {
+ *unit_min_fsb = 0;
+ *unit_max_fsb = 0;
+ } else if (atomic_write_unit_min == 0) {
+ *unit_min_fsb = 1;
+ *unit_max_fsb = min_t(xfs_filblks_t, atomic_write_unit_max,
+ align);

Why is it valid for a device to have a zero minimum size?

It's not valid. Local variables atomic_write_unit_max and atomic_write_unit_min unit here is FS blocks - maybe I should change names.

The idea is that for simplicity we won't support atomic writes for XFS of size less than 1x FS block initially. So if the bdev has - for example - queue_atomic_write_unit_min_bytes() == 2K and queue_atomic_write_unit_max_bytes() == 64K, then (ignoring alignment) we say that unit_min_fsb = 1 and unit_max_fsb = 16 (for 4K FS blocks).

If it can
set a maximum, it should -always- set a minimum size as logical
sector size is a valid lower bound, yes?

+ } else {
+ *unit_min_fsb = min_t(xfs_filblks_t, atomic_write_unit_min,
+ align);
+ *unit_max_fsb = min_t(xfs_filblks_t, atomic_write_unit_max,
+ align);
+ }

Nothing here guarantees the power-of-2 sizes that the RWF_ATOMIC
user interface requires....

atomic_write_unit_min and atomic_write_unit_max will be powers-of-2 (or 0).

But, you are right, we don't check align is a power-of-2 - that can be added.


It also doesn't check that the extent size hint is aligned with
atomic write units.

If we add a check for align being a power-of-2 and atomic_write_unit_min and atomic_write_unit_max are already powers-of-2, then this can be relied on, right?


It also doesn't check either against stripe unit alignment....

As mentioned in earlier response, this could be enforced.


+}
+
STATIC int
xfs_vn_getattr(
struct mnt_idmap *idmap,
@@ -614,6 +654,17 @@ xfs_vn_getattr(
stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
stat->dio_offset_align = bdev_logical_block_size(bdev);
}
+ if (request_mask & STATX_WRITE_ATOMIC) {
+ xfs_filblks_t unit_min_fsb, unit_max_fsb;
+
+ xfs_ip_atomic_write_attr(ip, &unit_min_fsb,
+ &unit_max_fsb);
+ stat->atomic_write_unit_min = XFS_FSB_TO_B(mp, unit_min_fsb);
+ stat->atomic_write_unit_max = XFS_FSB_TO_B(mp, unit_max_fsb);

That's just nasty. We pull byte units from the bdev, convert them to
fsb to round them, then convert them back to byte counts. We should
be doing all the work in one set of units....

ok, agreed. bytes is probably best.


+ stat->attributes |= STATX_ATTR_WRITE_ATOMIC;
+ stat->attributes_mask |= STATX_ATTR_WRITE_ATOMIC;
+ stat->result_mask |= STATX_WRITE_ATOMIC;

If the min/max are zero, then atomic writes are not supported on
this inode, right? Why would we set any of the attributes or result
mask to say it is supported on this file?

ok, we won't set STATX_ATTR_WRITE_ATOMIC for min/max are zero

Thanks,
John