Re: [PATCH v2] ext4: fix fast commit inode enqueueing during a full journal commit
From: Luis Henriques
Date: Mon May 27 2024 - 11:48:38 EST
On Mon 27 May 2024 09:29:40 AM +01, Luis Henriques wrote;
<snip>
>>> + /*
>>> + * Used to flag an inode as part of the next fast commit; will be
>>> + * reset during fast commit clean-up
>>> + */
>>> + tid_t i_fc_next;
>>> +
>>
>> Do we really need new tid in the inode? I'd be kind of hoping we could use
>> EXT4_I(inode)->i_sync_tid for this - I can see we even already set it in
>> ext4_fc_track_template() and used for similar comparisons in fast commit
>> code.
>
> Ah, true. It looks like it could be used indeed. We'll still need a flag
> here, but a simple bool should be enough for that.
After looking again at the code, I'm not 100% sure that this is actually
doable. For example, if I replace the above by
bool i_fc_next;
and set to to 'true' below:
>>> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
>>> index 87c009e0c59a..bfdf249f0783 100644
>>> --- a/fs/ext4/fast_commit.c
>>> +++ b/fs/ext4/fast_commit.c
>>> @@ -402,6 +402,8 @@ static int ext4_fc_track_template(
>>> sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) ?
>>> &sbi->s_fc_q[FC_Q_STAGING] :
>>> &sbi->s_fc_q[FC_Q_MAIN]);
>>> + else
>>> + ei->i_fc_next = tid;
ei->i_fc_next = true;
Then, when we get to the ext4_fc_cleanup(), the value of iter->i_sync_tid
may have changed in the meantime from, e.g., ext4_do_update_inode() or
__ext4_iget(). This would cause the clean-up code to be bogus if it still
implements a the logic below, by comparing the tid with i_sync_tid.
(Although, to be honest, I couldn't see any visible effect in the quick
testing I've done.) Or am I missing something, and this is *exactly* the
behaviour you'd expect?
Cheers,
--
Luis
>>> spin_unlock(&sbi->s_fc_lock);
>>>
>>> return ret;
>>> @@ -1280,6 +1282,15 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>>> list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
>>> i_fc_list) {
>>> list_del_init(&iter->i_fc_list);
>>> + if (iter->i_fc_next == tid)
>>> + iter->i_fc_next = 0;
>>> + else if (iter->i_fc_next > tid)
>> ^^^ careful here, TIDs do wrap so you need to use
>> tid_geq() for comparison.
>>
>
> Yikes! Thanks, I'll update the code to do that.
>
>>> + /*
>>> + * re-enqueue inode into STAGING, which will later be
>>> + * splice back into MAIN
>>> + */
>>> + list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list,
>>> + &sbi->s_fc_q[FC_Q_STAGING]);
>>> ext4_clear_inode_state(&iter->vfs_inode,
>>> EXT4_STATE_FC_COMMITTING);
>>> if (iter->i_sync_tid <= tid)
>> ^^^ and I can see this is buggy as
>> well and needs tid_geq() (not your fault obviously).
>
> Yeah, good point. I can that too in v3.
>
> Again, thanks a lot for your review!
>
> Cheers,
> --
> Luís