Re: [PATCH v1] ext4: fix a assertion failure due to ungranted bh dirting

From: liubaolin
Date: Wed Oct 16 2024 - 09:38:52 EST


Hello,
I reviewed the patch attached in your email. The issue you mentioned about clearing buffer_new(bh) in write_end_fn() is indeed a bug.
However, this patch does not resolve the crash issue we encountered.

Let me explain my analysis in detail below.
The crash occurs in the function jbd2_journal_dirty_metadata().

ext4_block_write_begin() -> ext4_journalled_zero_new_buffers() -> write_end_fn()
-> ext4_dirty_journalled_data() -> ext4_handle_dirty_metadata() -> __ext4_handle_dirty_metadata()
-> jbd2_journal_dirty_metadata()

In the function jbd2_journal_dirty_metadata(), there is the following condition:
—---------------------------------------------------------------------------------------------------
if (data_race(jh->b_transaction != transaction &&
jh->b_next_transaction != transaction)) {
spin_lock(&jh->b_state_lock);
J_ASSERT_JH(jh, jh->b_transaction == transaction ||
jh->b_next_transaction == transaction);
spin_unlock(&jh->b_state_lock);
}
----------------------------------------------------------------------------------------------------
By analyzing the vmcore, I found that both jh->b_transaction and jh->b_next_transaction are NULL.
Through code analysis, I discovered that the __jbd2_journal_file_buffer() function adds the corresponding transaction of bh to jh->b_transaction.
Normally, this is accessed through do_journal_get_write_access(), which can call __jbd2_journal_file_buffer().
The detailed function call process is as follows:
do_journal_get_write_access() -> ext4_journal_get_write_access() -> __ext4_journal_get_write_access()
-> jbd2_journal_get_write_access() -> do_get_write_access() -> __jbd2_journal_file_buffer()

Therefore, resolving the crash issue requires obtaining write access before calling the jbd2_journal_dirty_metadata() function.
The comment at the definition of the jbd2_journal_dirty_metadata() function also states: 'The buffer must have previously had jbd2_journal_get_write_access().'

In the ext4_block_write_begin() function, if get_block() encounters an error, then neither bh->b_this_page nor the subsequent bh calls do_journal_get_write_access().
If bh->b_this_page and the subsequent bh are in the new state, it will lead to a crash when reaching the jbd2_journal_dirty_metadata() function.

So, there are two ways to resolve this crash issue:
1、Call do_journal_get_write_access() on bh that is not handled due to get_block() error.
The patch modification is in the attachment 0001-ext4-fix-a-assertion-failure-due-to-ungranted-bh-dir.patch.

2、Call clear_buffer_new() on bh that is not handled due to get_block() error.
The patch modification is in the attachment 0001-ext4-fix-a-assertion-failure-due-to-bh-not-clear-new.patch.

Additionally, I have found a method to quickly reproduce this crash issue.
For details, please refer to the email I previously sent you: “https://lore.kernel.org/all/bd41c24b-7325-4584-a965-392a32e32c74@xxxxxxx/”;.
I have verified that this quick reproduction method works for both solutions to resolve the issue.

Please continue to consider which method is better to resolve this issue. If you think that using clear_buffer_new() is a better solution, I can resend the patch via git send-mail.



在 2024/10/16 18:33, Jan Kara 写道:
Hello,

On Fri 11-10-24 12:08:58, Baolin Liu wrote:
Greetings,

This problem is reproduced by our customer using their own testing tool
“run_bug”. When I consulted with a client, the testing tool “run_bug”
used a variety of background programs to benchmark (including memory
pressure, cpu pressure, file cycle manipulation, fsstress Stress testing
tool, postmark program,and so on).

The recurrence probability is relatively low.

OK, thanks for asking!

In response to your query, in ext4_block_write_begin, the new state will
be clear before get block, and the bh that failed get_block will not be
set to new. However, when the page size is greater than the block size, a
page will contain multiple bh.

True. I wanted to argue that the buffer_new bit should be either cleared in
ext4_block_write_begin() (in case of error) or in
ext4_journalled_write_end() (in case of success) but actually
ext4_journalled_write_end() misses the clearing. So I think the better
solution is like the attached patch. I'll submit it once testing finishes
but it would be great if you could test that it fixes your problems as
well. Thanks!

Honza
From cbed522ccb695681d94ec02940e958fcf77e58cd Mon Sep 17 00:00:00 2001
From: Baolin Liu <liubaolin@xxxxxxxxxx>
Date: Wed, 9 Oct 2024 09:30:34 +0800
Subject: [PATCH] ext4: fix a assertion failure due to bh not clear new

Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate
buffers in ext4_journalled_zero_new_buffers"), a new assertion failure
occurred under a old kernel(ext3, data=journal, pagesize=64k) with
corresponding ported patches:
================================================================
Call trace:
__ext4_handle_dirty_metadata+0x320/0x7e8
write_end_fn+0x78/0x178
ext4_journalled_zero_new_buffers+0xd0/0x2c8
ext4_block_write_begin+0x850/0xc00
ext4_write_begin+0x334/0xc68
generic_perform_write+0x1a4/0x380
ext4_buffered_write_iter+0x180/0x370
ext4_file_write_iter+0x194/0xfc0
new_sync_write+0x338/0x4b8
__vfs_write+0xc4/0xe8
vfs_write+0x12c/0x3d0
ksys_write+0xf4/0x230
sys_write+0x34/0x48
el0_svc_naked+0x44/0x48
================================================================

which was caused by bh dirting without calling clear_buffer_new().

In the loop for all bhs of a page in ext4_block_write_begin(),
when a err occurred, it will jump out of loop.
But that will leaves some bhs being processed and some not,
which will lead to the asserion failure in calling write_end_fn().

To fixed that, clear_buffer_new for the rest unprocessed bhs, just
as what write_end_fn do.

Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers")
Reported-and-tested-by: Zhi Long <longzhi@xxxxxxxxxxxxxx>
Suggested-by: Shida Zhang <zhangshida@xxxxxxxxxx>
Signed-off-by: Baolin Liu <liubaolin@xxxxxxxxxx>
---
fs/ext4/inode.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 54bdd4884fe6..26c107e083c6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1102,9 +1102,23 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
err = -EIO;
}
if (unlikely(err)) {
- if (should_journal_data)
+ if (should_journal_data) {
+ if (bh != head || !block_start) {
+ do {
+ block_end = block_start + bh->b_size;
+
+ if (buffer_new(bh))
+ if (block_end > from && block_start < to)
+ clear_buffer_new(bh);
+
+ block_start = block_end;
+ bh = bh->b_this_page;
+ } while (bh != head);
+ }
+
ext4_journalled_zero_new_buffers(handle, inode, folio,
from, to);
+ }
else
folio_zero_new_buffers(folio, from, to);
} else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
--
2.39.2

From e99bce558ddcc98e2495280e1af0dee372fae619 Mon Sep 17 00:00:00 2001
From: Baolin Liu <liubaolin@xxxxxxxxxx>
Date: Wed, 9 Oct 2024 09:30:34 +0800
Subject: [PATCH v1] ext4: fix a assertion failure due to ungranted bh dirting

Since the merge of commit 3910b513fcdf ("ext4: persist the new uptodate
buffers in ext4_journalled_zero_new_buffers"), a new assertion failure
occurred under a old kernel(ext3, data=journal, pagesize=64k) with
corresponding ported patches:
================================================================
Call trace:
__ext4_handle_dirty_metadata+0x320/0x7e8
write_end_fn+0x78/0x178
ext4_journalled_zero_new_buffers+0xd0/0x2c8
ext4_block_write_begin+0x850/0xc00
ext4_write_begin+0x334/0xc68
generic_perform_write+0x1a4/0x380
ext4_buffered_write_iter+0x180/0x370
ext4_file_write_iter+0x194/0xfc0
new_sync_write+0x338/0x4b8
__vfs_write+0xc4/0xe8
vfs_write+0x12c/0x3d0
ksys_write+0xf4/0x230
sys_write+0x34/0x48
el0_svc_naked+0x44/0x48
================================================================

which was caused by bh dirting without calling
do_journal_get_write_access().

In the loop for all bhs of a page in ext4_block_write_begin(),
when a err occurred, it will jump out of loop.
But that will leaves some bhs being processed and some not,
which will lead to the asserion failure in calling write_end_fn().

To fixed that, get write access for the rest unprocessed bhs, just
as what write_end_fn do.

Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers")
Reported-and-tested-by: Zhi Long <longzhi@xxxxxxxxxxxxxx>
Suggested-by: Shida Zhang <zhangshida@xxxxxxxxxx>
Signed-off-by: Baolin Liu <liubaolin@xxxxxxxxxx>
---
fs/ext4/inode.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 54bdd4884fe6..a72f951288e4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1102,9 +1102,24 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
err = -EIO;
}
if (unlikely(err)) {
- if (should_journal_data)
+ if (should_journal_data) {
+ if (bh != head || !block_start) {
+ do {
+ block_end = block_start + bh->b_size;
+
+ if (buffer_new(bh))
+ if (block_end > from && block_start < to)
+ do_journal_get_write_access(handle,
+ inode, bh);
+
+ block_start = block_end;
+ bh = bh->b_this_page;
+ } while (bh != head);
+ }
+
ext4_journalled_zero_new_buffers(handle, inode, folio,
from, to);
+ }
else
folio_zero_new_buffers(folio, from, to);
} else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
--
2.39.2