Re: [PATCH v4 04/23] ext4: add iomap address space operations for buffered I/O
From: Ojaswin Mujoo
Date: Tue May 19 2026 - 12:53:56 EST
On Tue, May 19, 2026 at 08:35:51PM +0800, Zhang Yi wrote:
> On 5/19/2026 5:28 PM, Ojaswin Mujoo wrote:
> > On Mon, May 11, 2026 at 03:23:24PM +0800, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
> >>
> >> Introduce initial support for iomap in the buffered I/O path for regular
> >> files on ext4.
> >>
> >> - Add a new inode state flag EXT4_STATE_BUFFERED_IOMAP to indicate the
> >> inode uses iomap instead of buffer_head for buffered I/O
> >> - Add helper ext4_inode_buffered_iomap() to check the flag
> >> - Add new address space operations ext4_iomap_aops with callbacks that
> >> will use generic iomap implementations
> >> - Add ext4_iomap_aops to ext4_set_aops() when the flag is set
> >>
> >> The following callbacks(read_folio(), readahead(), writepages()) are
> >> provided as placeholders and will be implemented in later patches.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
> >> Reviewed-by: Jan Kara <jack@xxxxxxx>
> >
> > Hi Zhang, looks good to me. Just a questions below:
>
> Hi, Ojaswin! Thank you for the review of this series.
>
> >> ---
> >> fs/ext4/ext4.h | 7 +++++++
> >> fs/ext4/inode.c | 32 ++++++++++++++++++++++++++++++++
> >> 2 files changed, 39 insertions(+)
> >>
> >> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> >> index 94283a991e5c..1e27d73d7427 100644
> >> --- a/fs/ext4/ext4.h
> >> +++ b/fs/ext4/ext4.h
> >> @@ -1972,6 +1972,7 @@ enum {
> >> EXT4_STATE_FC_COMMITTING, /* Fast commit ongoing */
> >> EXT4_STATE_FC_FLUSHING_DATA, /* Fast commit flushing data */
> >> EXT4_STATE_ORPHAN_FILE, /* Inode orphaned in orphan file */
> >> + EXT4_STATE_BUFFERED_IOMAP, /* Inode use iomap for buffered IO */
> >> };
> >>
> >> #define EXT4_INODE_BIT_FNS(name, field, offset) \
> >> @@ -2040,6 +2041,12 @@ static inline bool ext4_inode_orphan_tracked(struct inode *inode)
> >> !list_empty(&EXT4_I(inode)->i_orphan);
> >> }
> >>
> >> +/* Whether the inode pass through the iomap infrastructure for buffered I/O */
> >> +static inline bool ext4_inode_buffered_iomap(struct inode *inode)
> >> +{
> >> + return ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP);
> >> +}
> >> +
> >> /*
> >> * Codes for operating systems
> >> */
> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> >> index b1ef706987c3..178ac2be37b7 100644
> >> --- a/fs/ext4/inode.c
> >> +++ b/fs/ext4/inode.c
> >> @@ -3908,6 +3908,22 @@ const struct iomap_ops ext4_iomap_report_ops = {
> >> .iomap_begin = ext4_iomap_begin_report,
> >> };
> >>
> >> +static int ext4_iomap_read_folio(struct file *file, struct folio *folio)
> >> +{
> >> + return 0;
> >> +}
> >> +
> >> +static void ext4_iomap_readahead(struct readahead_control *rac)
> >> +{
> >> +
> >> +}
> >> +
> >> +static int ext4_iomap_writepages(struct address_space *mapping,
> >> + struct writeback_control *wbc)
> >> +{
> >> + return 0;
> >> +}
> >> +
> >> /*
> >> * For data=journal mode, folio should be marked dirty only when it was
> >> * writeably mapped. When that happens, it was already attached to the
> >> @@ -3994,6 +4010,20 @@ static const struct address_space_operations ext4_da_aops = {
> >> .swap_activate = ext4_iomap_swap_activate,
> >> };
> >>
> >> +static const struct address_space_operations ext4_iomap_aops = {
> >> + .read_folio = ext4_iomap_read_folio,
> >> + .readahead = ext4_iomap_readahead,
> >> + .writepages = ext4_iomap_writepages,
> >> + .dirty_folio = iomap_dirty_folio,
> >> + .bmap = ext4_bmap,
> >> + .invalidate_folio = iomap_invalidate_folio,
> >> + .release_folio = iomap_release_folio,
> >> + .migrate_folio = filemap_migrate_folio,
> >> + .is_partially_uptodate = iomap_is_partially_uptodate,
> >> + .error_remove_folio = generic_error_remove_folio,
> >> + .swap_activate = ext4_iomap_swap_activate,
> >> +};
> >
> > So one question, for ->release_folio() we are using
> > iomap_release_folio() instead of ext4_release_folio() here which doesnt
> > make the jbd2_journal_try_to_free_bufferes() call. IIUC this function
> > seems to be trying to clean up already checkpointed buffers.
> >
> > I wanted to check if ->release_folio() can be called for folios with
> > ext4 metadata buffers? (from my limited understanding of
> > shrink_folio_list() -> filemap_release_folio() it seems we can) And if
> > it can be called, is it okay to skip the
> > jbd2_journal_try_to_free_buffers call?
>
> Here, in ->release_folio(), folio->mapping points to inode->i_data (the
> file's pagecache), not the block device's pagecache. ext4 metadata
> resides in the block device's pagecache, which is at a different layer
> than this release_folio callback. So we don't need to call
> jbd2_journal_try_to_free_buffers() in the iomap path here.
Hi Yi,
Thanks for clarify and yes, thats what I was missing! So this
->release_folio() is only for data folios. So I guess the
jbd2_journal_try_to_free_buffers() is mostly to handle data=journal
case?
Regardless, with that clarification feel free to add:
Reviewed-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx>
Regards,
ojaswin
>
> Thanks,
> Yi.
>
> >
> > Regards,
> > ojaswin
> >
> >> +
> >> static const struct address_space_operations ext4_dax_aops = {
> >> .writepages = ext4_dax_writepages,
> >> .dirty_folio = noop_dirty_folio,
> >> @@ -4015,6 +4045,8 @@ void ext4_set_aops(struct inode *inode)
> >> }
> >> if (IS_DAX(inode))
> >> inode->i_mapping->a_ops = &ext4_dax_aops;
> >> + else if (ext4_inode_buffered_iomap(inode))
> >> + inode->i_mapping->a_ops = &ext4_iomap_aops;
> >> else if (test_opt(inode->i_sb, DELALLOC))
> >> inode->i_mapping->a_ops = &ext4_da_aops;
> >> else
> >> --
> >> 2.52.0
> >>
>