Re: [RFC 0/5] ext4: Implement support for extsize hints
From: IBM
Date: Fri Sep 13 2024 - 07:52:16 EST
John Garry <john.g.garry@xxxxxxxxxx> writes:
> On 11/09/2024 10:01, Ojaswin Mujoo wrote:
>> This patchset implements extsize hint feature for ext4. Posting this RFC to get
>> some early review comments on the design and implementation bits. This feature
>> is similar to what we have in XFS too with some differences.
>>
>> extsize on ext4 is a hint to mballoc (multi-block allocator) and extent
>> handling layer to do aligned allocations. We use allocation criteria 0
>> (CR_POWER2_ALIGNED) for doing aligned power-of-2 allocations. With extsize hint
>> we try to align the logical start (m_lblk) and length(m_len) of the allocation
>> to be extsize aligned. CR_POWER2_ALIGNED criteria in mballoc automatically make
>> sure that we get the aligned physical start (m_pblk) as well. So in this way
>> extsize can make sure that lblk, len and pblk all are aligned for the allocated
>> extent w.r.t extsize.
>>
>> Note that extsize feature is just a hinting mechanism to ext4 multi-block
>> allocator. That means that if we are unable to get an aligned allocation for
>> some reason, than we drop this flag and continue with unaligned allocation to
>> serve the request. However when we will add atomic/untorn writes support, then
>> we will enforce the aligned allocation and can return -ENOSPC if aligned
>> allocation was not successful.
>
> A few questions/confirmations:
> - You have no intention of adding an equivalent of forcealign, right?
extsize is just a hinting mechanism that too only for __allocation__
path. But for atomic writes we do require some form of forcealign (like
how we have in XFS). So we could either call this directly as atomic
write feature or can may as well call this forcealign feature and make
atomic writes depend upon it, like how XFS is doing it.
I still haven't understood if there is/will be a user specifically for
forcealign other than atomic writes.
Since you asked, I am more curious to know if there is some more context
to your question?
>
> - Would you also plan on using FS_IOC_FS(GET/SET)XATTR interface for
> enabling atomic writes on a per-inode basis?
Yes, that interface should indeed be kept same for EXT4 too.
>
> - Can extsize be set at mkfs time?
Good point. For now in this series, extsize can only be set using the
same ioctl on a per inode basis.
IIUC, XFS supports doing both right. We can do this on a per-inode basis
during ioctl or it also supports setting this during mkfs.xfs time.
(maybe xfsprogs only allows setting this at mkfs time for rtvolumes for now)
So if this is set during mkfs.xfs time and then by default all inodes will
have this extsize attribute value set right?
BTW, this brings me to another question that I had asked here too [1].
1. For XFS, atomic writes can only be enabled with a fresh mkfs.xfs -d
atomic-writes=1 right?
2. For atomic writes to be enabled, we need all 3 features to be
enabled during mkfs.xfs time itself right?
i.e.
"mkfs.xfs -i forcealign=1 -d extsize=16384 -d atomic-writes=1"
[1]: https://lore.kernel.org/linux-xfs/20240817094800.776408-1-john.g.garry@xxxxxxxxxx/
>
> - Is there any userspace support for this series available?
Make sense to maybe provide a userspace support link too.
For now, a quick hack would be to just allow setting extsize hint for
other fileystems as well in xfs_io.
diff --git a/io/open.c b/io/open.c
index 15850b55..6407b7e8 100644
--- a/io/open.c
+++ b/io/open.c
@@ -980,7 +980,7 @@ open_init(void)
extsize_cmd.args = _("[-D | -R] [extsize]");
extsize_cmd.argmin = 0;
extsize_cmd.argmax = -1;
- extsize_cmd.flags = CMD_NOMAP_OK;
+ extsize_cmd.flags = CMD_NOMAP_OK | CMD_FOREIGN_OK;
extsize_cmd.oneline =
_("get/set preferred extent size (in bytes) for the open file");
extsize_cmd.help = extsize_help;
<e.g>
/dev/loop6 on /mnt1/test type ext4 (rw,relatime)
root@qemu:~/xt/xfsprogs-dev# ./io/xfs_io -fc "extsize" /mnt1/test/f1
[0] /mnt1/test/f1
root@qemu:~/xt/xfsprogs-dev# ./io/xfs_io -c "extsize 16384" /mnt1/test/f1
root@qemu:~/xt/xfsprogs-dev# ./io/xfs_io -c "extsize" /mnt1/test/f1
[16384] /mnt1/test/f1
>
> - how would/could extsize interact with bigalloc?
>
As of now it is kept disabled with bigalloc.
+ if (sbi->s_cluster_ratio > 1) {
+ msg = "Can't use extsize hint with bigalloc";
+ err = -EINVAL;
+ goto error;
+ }
>>
>> Comparison with XFS extsize feature -
>> =====================================
>> 1. extsize in XFS is a hint for aligning only the logical start and the lengh
>> of the allocation v/s extsize on ext4 make sure the physical start of the
>> extent gets aligned as well.
>
> note that forcealign with extsize aligns AG block also
Can you expand that on a bit. You mean during mkfs.xfs time we ensure
agblock boundaries are extsize aligned?
>
> only for atomic writes do we enforce the AG block is aligned to physical
> block
>
If you could expand that a bit please? You meant during mkfs.xfs
time for atomic writes we ensure ag block start bounaries are extsize aligned?
>>
>> 2. eof allocation on XFS trims the blocks allocated beyond eof with extsize
>> hint. That means on XFS for eof allocations (with extsize hint) only logical
>> start gets aligned. However extsize hint in ext4 for eof allocation is not
>> supported in this version of the series.
>>
>> 3. XFS allows extsize to be set on file with no extents but delayed data.
>> However, ext4 don't allow that for simplicity. The user is expected to set
>> it on a file before changing it's i_size.
>>
>> 4. XFS allows non-power-of-2 values for extsize but ext4 does not, since we
>> primarily would like to support atomic writes with extsize.
>>
>> 5. In ext4 we chose to store the extsize value in SYSTEM_XATTR rather than an
>> inode field as it was simple and most flexible, since there might be more
>> features like atomic/untorn writes coming in future.
>>
>> 6. In buffered-io path XFS switches to non-delalloc allocations for extsize hint.
>> The same has been kept for EXT4 as well.
>>
>> Some TODOs:
>> ===========
>> 1. EOF allocations support can be added and can be kept similar to XFS
>
> Note that EOF alignment for forcealign may change - it needs to be
> discussed further.
Sure, thanks for pointing that out.
I guess you are referring to mainly the truncate related EOF alignment change
required with forcealign for XFS.
-ritesh