On 2025/3/13 1:15, Jan Kara wrote:If we consider the possibility that there might be calls to ext4_error()
On Wed 12-03-25 19:56:36, Ojaswin Mujoo wrote:Ha, right! This is a case where kjournald triggers an ext4 error but does
On Wed, Mar 12, 2025 at 11:51:03AM +0100, Jan Kara wrote:Yeah. The reason why I'm a bit concerned about it is mostly the case of
On Mon 10-03-25 10:13:36, Ritesh Harjani wrote:Oh, right :)
Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> writes:So I was thinking about this. It is not a problem to determine we are
On Sun, Mar 09, 2025 at 12:11:22AM +0530, Ritesh Harjani wrote:
Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> writes:
On Sat, Mar 08, 2025 at 06:56:23PM +0530, Ritesh Harjani wrote:
Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> writes:Hmm, ideally yes that should not happen, but how can we achieve that?
On Sat, Mar 08, 2025 at 03:25:04PM +0530, Ritesh Harjani (IBM) wrote:Ohk. right.
Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> writes:Hey Ritesh,
Presently we always BUG_ON if trying to start a transaction on a journal markedThis is not correct right. I think what we decided to set this flag
with JBD2_UNMOUNT, since this should never happen. However, while ltp running
stress tests, it was observed that in case of some error handling paths, it is
possible for update_super_work to start a transaction after the journal is
destroyed eg:
(umount)
ext4_kill_sb
kill_block_super
generic_shutdown_super
sync_filesystem /* commits all txns */
evict_inodes
/* might start a new txn */
ext4_put_super
flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */
jbd2_journal_destroy
journal_kill_thread
journal->j_flags |= JBD2_UNMOUNT;
jbd2_journal_commit_transaction
jbd2_journal_get_descriptor_buffer
jbd2_journal_bmap
ext4_journal_bmap
ext4_map_blocks
...
ext4_inode_error
ext4_handle_error
schedule_work(&sbi->s_sb_upd_work)
/* work queue kicks in */
update_super_work
jbd2_journal_start
start_this_handle
BUG_ON(journal->j_flags &
JBD2_UNMOUNT)
Hence, introduce a new sbi flag s_journal_destroying to indicate journal is
destroying only do a journaled (and deferred) update of sb if this flag is not
set. Otherwise, just fallback to an un-journaled commit.
We set sbi->s_journal_destroying = true only after all the FS updates are done
during ext4_put_super() (except a running transaction that will get commited
during jbd2_journal_destroy()). After this point, it is safe to commit the sb
outside the journal as it won't race with a journaled update (refer
2d01ddc86606).
Also, we don't need a similar check in ext4_grp_locked_error since it is only
called from mballoc and AFAICT it would be always valid to schedule work here.
Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
Reported-by: Mahesh Kumar <maheshkumar657g@xxxxxxxxx>
Suggested-by: Jan Kara <jack@xxxxxxx>
Signed-off-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx>
---
fs/ext4/ext4.h | 2 ++
fs/ext4/ext4_jbd2.h | 8 ++++++++
fs/ext4/super.c | 4 +++-
3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2b7d781bfcad..d48e93bd5690 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1728,6 +1728,8 @@ struct ext4_sb_info {
*/
struct work_struct s_sb_upd_work;
+ bool s_journal_destorying;
+
/* Atomic write unit values in bytes */
unsigned int s_awu_min;
unsigned int s_awu_max;
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 9b3c9df02a39..6bd3ca84410d 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -437,6 +437,14 @@ static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *jour
{
int err = 0;
+ /*
+ * At this point all pending FS updates should be done except a possible
+ * running transaction (which will commit in jbd2_journal_destroy). It
+ * is now safe for any new errors to directly commit superblock rather
+ * than going via journal.
+ */
+ sbi->s_journal_destorying = true;
before we flush the workqueue. So that we don't schedule any new
work after this flag has been set. At least that is what I understood.
[1]: https://lore.kernel.org/all/87eczc6rlt.fsf@xxxxxxxxx/
-ritesh
Yes that is not correct, I missed that in my patch however we realised
that adding it before flush_work() also has issues [1]. More
specifically:
**kjournald2**Then maybe we should not schedule another work to update the superblock
jbd2_journal_commit_transaction()
...
ext4_handle_error()
/* s_journal_destorying is not set */
if (journal && !s_journal_destorying)
via journalling, it the error itself occurred while were trying to
commit the journal txn?
-ritesh
For example with the trace we saw:
**kjournald2**
jbd2_journal_commit_transaction()
jbd2_journal_get_descriptor_buffer
jbd2_journal_bmap
ext4_journal_bmap
ext4_map_blocks
...
ext4_inode_error
ext4_handle_error
schedule_work(&sbi->s_sb_upd_work)
How do we tell ext4_handle_error that it is in the context of a
committing txn.
running in kjournald context - it is enough to check
current == EXT4_SB(sb)->s_journal->j_task
But I'm not sure checking this in ext4_handle_error() and doing direct sbOkay so IIUC your concern is there might be some codepaths, now or in
update instead of scheduling a journalled one is always correct. For
example kjournald does also writeback of ordered data and if that hits an
error, we do not necessarily abort the journal (well, currently we do as
far as I'm checking but it seems a bit fragile to rely on this).
the future, where kjournald might call the FS layer, hit an error and
still decide to not abort. In which case we would still want to update
the sb via journal.
kjournald also handling ordered data and situations like
!(journal->j_flags & JBD2_ABORT_ON_SYNCDATA_ERR) where people want to
continue although ordered data had issues. Or situations where something in
j_commit_callback or another jbd2 hook ends up calling ext4_error()...
not abort the journal for now, I forgot this one, and there may be more.
Thanks for pointing it out. I would also prefer to use this solution of
adding ext4_journal_destory().