On Thu, Sep 26, 2024 at 10:50 AM Baokun Li <libaokun1@xxxxxxxxxx> wrote:Okay, I'll send a patch later.
On 2024/9/26 0:17, Aleksandr Mikhalitsyn wrote:Hi Baokun,
On Wed, Sep 25, 2024 at 5:57 PM Jan Kara <jack@xxxxxxx> wrote:Hi Alex,
On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:Hi Jan,
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none.Ah, I was staring at this for a while before I understood what's going on
[ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
[ 33.888740] ------------[ cut here ]------------
[ 33.888742] kernel BUG at fs/ext4/resize.c:324!
(it would be great to explain this in the changelog BTW). As far as I
understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation
in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than
flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
- 1) which then confuses things. I think that was not really intended and
First of all, thanks for your reaction/review on this one ;-)
You are absolutely right, have just checked with our reproducer and
this modification:
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index e04eb08b9060..530a918f0cab 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -258,6 +258,8 @@ static struct ext4_new_flex_group_data
*alloc_flex_gd(unsigned int flexbg_size,
flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1),
fls(n_group - last_group));
+ BUG_ON(flex_gd->resize_bg > flexbg_size);
+
flex_gd->groups = kmalloc_array(flex_gd->resize_bg,
sizeof(struct ext4_new_group_data),
GFP_NOFS);
and yes, it crashes on this BUG_ON. So it looks like instead of making
flex_gd->resize_bg to be smaller
than flexbg_size in most cases we can actually have an opposite effect
here. I guess we really need to fix alloc_flex_gd() too.
instead of fixing up ext4_alloc_group_tables() we should really changeAt the same time, if I understand the code right, as we can have
the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds
flexbg size. Baokun?
flex_gd->resize_bg != flexbg_size after
5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized
flex bg") and
665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
we should always refer to flex_gd->resize_bg value which means that
ext4_alloc_group_tables() fix is needed too.
Am I correct in my understanding?
These two are not exactly equivalent.Huge thanks for explaining this!
The flex_gd->resize_bg is only used to determine how many block groups we
allocate memory to, i.e., the maximum number of block groups per resize.
And the flexbg_size is used to make some judgement on flexible block
groups, for example, the BUG_ON triggered in the issue is to make sure
src_group and last_group must be in the same flexible block group.
Then I guess it's better if you send a patch with your fix.
Feel free to add my Tested-by tag.
I think it makes sense, and it's good to have more use cases to look
Question to you and Jan. Do you guys think that it makes sense to try
to create a minimal reproducer for this problem without Incus/LXD involved?
(only e2fsprogs, lvm tools, etc)
I guess this test can be put in the xfstests test suite, right?
Kind regards,
Alex