Re: [PATCH v2] ext4: fix fast commit inode enqueueing during a full journal commit

From: Jan Kara
Date: Fri May 24 2024 - 12:22:50 EST


On Thu 23-05-24 12:16:18, Luis Henriques (SUSE) wrote:
> When a full journal commit is on-going, any fast commit has to be enqueued
> into a different queue: FC_Q_STAGING instead of FC_Q_MAIN. This enqueueing
> is done only once, i.e. if an inode is already queued in a previous fast
> commit entry it won't be enqueued again. However, if a full commit starts
> _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to
> be done into FC_Q_STAGING. And this is not being done in function
> ext4_fc_track_template().
>
> This patch fixes the issue by flagging an inode that is already enqueued in
> either queues. Later, during the fast commit clean-up callback, if the
> inode has a tid that is bigger than the one being handled, that inode is
> re-enqueued into STAGING and the spliced back into MAIN.
>
> This bug was found using fstest generic/047. This test creates several 32k
> bytes files, sync'ing each of them after it's creation, and then shutting
> down the filesystem. Some data may be loss in this operation; for example a
> file may have it's size truncated to zero.
>
> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@xxxxxxxxx>

Thanks for the fix. Some comments below:

> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 983dad8c07ec..4c308c18c3da 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1062,9 +1062,18 @@ struct ext4_inode_info {
> /* Fast commit wait queue for this inode */
> wait_queue_head_t i_fc_wait;
>
> - /* Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len */
> + /*
> + * Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len,
> + * i_fc_next
> + */
> struct mutex i_fc_lock;
>
> + /*
> + * Used to flag an inode as part of the next fast commit; will be
> + * reset during fast commit clean-up
> + */
> + tid_t i_fc_next;
> +

Do we really need new tid in the inode? I'd be kind of hoping we could use
EXT4_I(inode)->i_sync_tid for this - I can see we even already set it in
ext4_fc_track_template() and used for similar comparisons in fast commit
code.

> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 87c009e0c59a..bfdf249f0783 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -402,6 +402,8 @@ static int ext4_fc_track_template(
> sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) ?
> &sbi->s_fc_q[FC_Q_STAGING] :
> &sbi->s_fc_q[FC_Q_MAIN]);
> + else
> + ei->i_fc_next = tid;
> spin_unlock(&sbi->s_fc_lock);
>
> return ret;
> @@ -1280,6 +1282,15 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
> list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
> i_fc_list) {
> list_del_init(&iter->i_fc_list);
> + if (iter->i_fc_next == tid)
> + iter->i_fc_next = 0;
> + else if (iter->i_fc_next > tid)
^^^ careful here, TIDs do wrap so you need to use
tid_geq() for comparison.

> + /*
> + * re-enqueue inode into STAGING, which will later be
> + * splice back into MAIN
> + */
> + list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list,
> + &sbi->s_fc_q[FC_Q_STAGING]);
> ext4_clear_inode_state(&iter->vfs_inode,
> EXT4_STATE_FC_COMMITTING);
> if (iter->i_sync_tid <= tid)
^^^ and I can see this is buggy as
well and needs tid_geq() (not your fault obviously).

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR