Re: [External] Re: [PATCH 1/2] ext4: use ext4_ext_remove_space() for fast commit replay delete range
From: Jan Kara
Date: Fri Feb 04 2022 - 06:36:44 EST
On Fri 04-02-22 02:44:16, Ritesh Harjani wrote:
> Ok, so I now know why the inode->i_size is 0 during replay phase (for file foo).
> This is because inode->i_disksize is not really updated until after the
> ext4_writepages() kicks in, which in this case, won't happen (for file foo)
> when we are doing fsync on file bar. And hence fsync on file bar won't also
> not ensure the delalloc blocks for file foo get's written out.
>
> In fact this above information was something that I was assuming it all
> wrong. Earlier I was of the opinion that fast_commit still pushes _all_
> the dirty pagecache data of other files to disk too (which is incorrect)
> and the only performance gains happens via less writes to disk (since we
> write less metadata on disk).
>
> But I think what really happens is - In case of fast_commit when fsync is
> called on any file (say bar), apart from that file's (bar) dirty data, it
> only writes the necessary required metadata information of the blocks of
> others files (in this case file foo) which are already allocated. (which
> in this case was due to fzero operation). It does not actually allocate
> the delalloc blocks due to buffered writes of any other file (other than
> for file on which fsync is called).
Yes, but that is exactly what also happens for normal commit. I.e. even
without fastcommits, if we fsync(2), we will flush out data for that file
but for all the other files, buffered data still stays in delalloc state in
the page cache. Following journal commit will thus write all metadata (and
wait for data) of the fsynced files but not for any other file that has
only delalloc blocks. If writeback of some other file also happened before
we commit, then yes, we include all the changes to the commit as well.
> This happens in
> ext4_fc_perform_commit() -> ext4_fc_submit_inode_data_all() ->
> jbd2_submit_inode_data -> jbd2_journal_submit_inode_data_buffers() ->
> generic_writepages() -> using writepage() which won't do block allocation for
> delalloc blocks.
>
> So that above is what should give the major performance boost with fast_commit
> in case of multiple file writes doing fsync. :)
>
> @Jan/Harshad - could you please confirm if above is correct?
What you describe is correct but not special to fastcommit. As I mentioned
on the call yesterday, fastcommit is currently beneficial only because the
logical logging it does ends up writing much less blocks to the journal.
Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR