Re: [PATCH v3 08/21] xfs: Introduce FORCEALIGN inode flag

From: Dave Chinner
Date: Tue Apr 30 2024 - 19:23:01 EST


On Mon, Apr 29, 2024 at 05:47:33PM +0000, John Garry wrote:
> From: "Darrick J. Wong" <djwong@xxxxxxxxxx>
>
> Add a new inode flag to require that all file data extent mappings must
> be aligned (both the file offset range and the allocated space itself)
> to the extent size hint. Having a separate COW extent size hint is no
> longer allowed.
>
> The goal here is to enable sysadmins and users to mandate that all space
> mappings in a file must have a startoff/blockcount that are aligned to
> (say) a 2MB alignment and that the startblock/blockcount will follow the
> same alignment.
>
> jpg: Enforce extsize is a power-of-2 and aligned with afgsize + stripe
> alignment for forcealign
> Signed-off-by: "Darrick J. Wong" <djwong@xxxxxxxxxx>
> Co-developed-by: John Garry <john.g.garry@xxxxxxxxxx>
> Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
> ---

...

> @@ -783,3 +791,45 @@ xfs_inode_validate_cowextsize(
>
> return NULL;
> }
> +
> +/* Validate the forcealign inode flag */
> +xfs_failaddr_t
> +xfs_inode_validate_forcealign(
> + struct xfs_mount *mp,
> + uint16_t mode,

umode_t mode,

> + uint16_t flags,
> + uint32_t extsize,
> + uint32_t cowextsize)

extent sizes are xfs_extlen_t types.

> +{
> + /* superblock rocompat feature flag */
> + if (!xfs_has_forcealign(mp))
> + return __this_address;
> +
> + /* Only regular files and directories */
> + if (!S_ISDIR(mode) && !S_ISREG(mode))
> + return __this_address;
> +
> + /* Doesn't apply to realtime files */
> + if (flags & XFS_DIFLAG_REALTIME)
> + return __this_address;

Why not? A rt device with an extsize of 1 fsb could make use of
forced alignment just like the data device to allow larger atomic
writes to be done. I mean, just because we haven't written the code
to do this yet doesn't mean it is an illegal on-disk format state.

> + /* Requires a non-zero power-of-2 extent size hint */
> + if (extsize == 0 || !is_power_of_2(extsize) ||
> + (mp->m_sb.sb_agblocks % extsize))
> + return __this_address;

Please do these as indiviual checks with their own fail address.
That way we can tell which check failed from the console output.
Also, the agblocks check is already split out below, so it's being
checked twice...

Also, why does force-align require a power-of-2 extent size? Why
does it require the extent size to be an exact divisor of the AG
size? Aren't these atomic write alignment restrictions? i.e.
shouldn't these only be enforced when the atomic writes inode flag
is set?

> + /* Requires agsize be a multiple of extsize */
> + if (mp->m_sb.sb_agblocks % extsize)
> + return __this_address;
> +
> + /* Requires stripe unit+width (if set) be a multiple of extsize */
> + if ((mp->m_dalign && (mp->m_dalign % extsize)) ||
> + (mp->m_swidth && (mp->m_swidth % extsize)))
> + return __this_address;

Again, this is an atomic write constraint, isn't it?

> + /* Requires no cow extent size hint */
> + if (cowextsize != 0)
> + return __this_address;

What if it's a reflinked file?

....

> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index d0e2cec6210d..d1126509ceb9 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1110,6 +1110,8 @@ xfs_flags2diflags2(
> di_flags2 |= XFS_DIFLAG2_DAX;
> if (xflags & FS_XFLAG_COWEXTSIZE)
> di_flags2 |= XFS_DIFLAG2_COWEXTSIZE;
> + if (xflags & FS_XFLAG_FORCEALIGN)
> + di_flags2 |= XFS_DIFLAG2_FORCEALIGN;
>
> return di_flags2;
> }
> @@ -1146,6 +1148,22 @@ xfs_ioctl_setattr_xflags(
> if (i_flags2 && !xfs_has_v3inodes(mp))
> return -EINVAL;
>
> + /*
> + * Force-align requires a nonzero extent size hint and a zero cow
> + * extent size hint. It doesn't apply to realtime files.
> + */
> + if (fa->fsx_xflags & FS_XFLAG_FORCEALIGN) {
> + if (!xfs_has_forcealign(mp))
> + return -EINVAL;
> + if (fa->fsx_xflags & FS_XFLAG_COWEXTSIZE)
> + return -EINVAL;
> + if (!(fa->fsx_xflags & (FS_XFLAG_EXTSIZE |
> + FS_XFLAG_EXTSZINHERIT)))
> + return -EINVAL;
> + if (fa->fsx_xflags & FS_XFLAG_REALTIME)
> + return -EINVAL;
> + }

What about if the file already has shared extents on it (i.e.
reflinked or deduped?)

Also, why is this getting checked here instead of in
xfs_ioctl_setattr_check_extsize()?


> @@ -1263,7 +1283,19 @@ xfs_ioctl_setattr_check_extsize(
> failaddr = xfs_inode_validate_extsize(ip->i_mount,
> XFS_B_TO_FSB(mp, fa->fsx_extsize),
> VFS_I(ip)->i_mode, new_diflags);
> - return failaddr != NULL ? -EINVAL : 0;
> + if (failaddr)
> + return -EINVAL;
> +
> + if (new_diflags2 & XFS_DIFLAG2_FORCEALIGN) {
> + failaddr = xfs_inode_validate_forcealign(ip->i_mount,
> + VFS_I(ip)->i_mode, new_diflags,
> + XFS_B_TO_FSB(mp, fa->fsx_extsize),
> + XFS_B_TO_FSB(mp, fa->fsx_cowextsize));
> + if (failaddr)
> + return -EINVAL;
> + }

Oh, it's because you're trying to use on-disk format validation
routines for user API validation. That, IMO, is a bad idea because
the on-disk format and kernel/user APIs should not be tied
together as they have different constraints and error conditions.

That also explains why xfs_inode_validate_forcealign() doesn't just
get passed the inode to validate - it's because you want to pass
information from the user API to it. This results in sub-optimal
code for both on-disk format validation and user API validation.

Can you please separate these and put all the force align user API
validation checks in the one function?

-Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx