Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion

From: Heming Zhao

Date: Thu Mar 12 2026 - 23:17:52 EST

On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
> Almost looks fine, minor updates below.
>
> On 3/13/26 12:27 AM, Heming Zhao wrote:
> > During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> > ocfs2_dio_end_io_write
> > ocfs2_mark_extent_written
> > ocfs2_change_extent_flag
> > ocfs2_split_extent
> > ocfs2_try_to_merge_extent
> > ocfs2_extend_rotate_transaction
> > ocfs2_extend_trans
> > jbd2__journal_restart
> > start_this_handle
> > output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> >
> > To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> > handle each extent in a separate transaction.
> >
> > Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> > only be removed from the orphan list after the extent tree update is complete.
> > this ensures that if a crash occurs in the middle of extent tree updates, we
> > won't leave stale blocks beyond EOF.
> >
> > This patch also removes the only call to ocfs2_assure_trans_credits(), which
> > was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> > insufficient transaction credits").
> >
> > Finally, thanks to Jans for providing the bug fix prototype and suggestions.
> >
> > Suggested-by: Jan Kara <jack@xxxxxxx>
> > Signed-off-by: Heming Zhao <heming.zhao@xxxxxxxx>
> > ---
> > fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
> > 1 file changed, 29 insertions(+), 29 deletions(-)
> >
> > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> > index 09146b43d1f0..91997b330d39 100644
> > --- a/fs/ocfs2/aops.c
> > +++ b/fs/ocfs2/aops.c
> > @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> > goto out;
> > }
> >
> > - /* Delete orphan before acquire i_rwsem. */
> > - if (dwc->dw_orphaned) {
> > - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> > -
> > - end = end > i_size_read(inode) ? end : 0;
> > -
> > - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> > - !!end, end);
> > - if (ret < 0)
> > - mlog_errno(ret);
> > - }
> > -
> > down_write(&oi->ip_alloc_sem);
> > di = (struct ocfs2_dinode *)di_bh->b_data;
> >
> > @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >
> > credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> >
> > - handle = ocfs2_start_trans(osb, credits);
> > - if (IS_ERR(handle)) {
> > - ret = PTR_ERR(handle);
> > - mlog_errno(ret);
> > - goto unlock;
> > - }
> > - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> > - OCFS2_JOURNAL_ACCESS_WRITE);
> > - if (ret) {
> > - mlog_errno(ret);
> > - goto commit;
> > - }
> > -
> > list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> > - ret = ocfs2_assure_trans_credits(handle, credits);
> > - if (ret < 0) {
> > + handle = ocfs2_start_trans(osb, credits);
> > + if (IS_ERR(handle)) {
> > + ret = PTR_ERR(handle);
> > mlog_errno(ret);
> > break;
>
> I'd rather goto unlock directly without update i_size in case error.

agree
>
> > }
> > + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> > + OCFS2_JOURNAL_ACCESS_WRITE);
> > + if (ret) {
> > + mlog_errno(ret);
> > + ocfs2_commit_trans(osb, handle);
> > + break;
>
> Ditto.

agree
>
> > + }
> > ret = ocfs2_mark_extent_written(inode, &et, handle,
> > ue->ue_cpos, 1,
> > ue->ue_phys,
> > meta_ac, &dealloc);
> > if (ret < 0) {
> > mlog_errno(ret);
> > + ocfs2_commit_trans(osb, handle);
> > break;
>
> Ditto.

The existing code still updates i_size even if ocfs2_mark_extent_written()
returns an error. I am not certain whether updating i_size in this case is
correct, but I prefer to maintain the original logic for now.
Does that seem reasonable to you?

> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
> to save the extra call in case error.

agree
>
> Thanks,
> Joseph
>

I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
is about 12107 ~ 12457.
Regarding the transaction splitting in the unwritten handling loop: this patch
introduces a minor performance regression. I would like to merge some of the
transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
What do you think of this approach?

Heming
> > }
> > + ocfs2_commit_trans(osb, handle);
> > }
> >
> > if (end > i_size_read(inode)) {
> > + handle = ocfs2_start_trans(osb, credits);
> > + if (IS_ERR(handle)) {
> > + ret = PTR_ERR(handle);
> > + mlog_errno(ret);
> > + goto unlock;
> > + }
> > ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
> > if (ret < 0)
> > mlog_errno(ret);
> > + ocfs2_commit_trans(osb, handle);
> > }
> > -commit:
> > - ocfs2_commit_trans(osb, handle);
> > +
> > unlock:
> > up_write(&oi->ip_alloc_sem);
> > +
> > + /* everything looks good, let's start the cleanup */
> > + if (dwc->dw_orphaned) {
> > + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> > +
> > + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> > + if (ret < 0)
> > + mlog_errno(ret);
> > + }
> > ocfs2_inode_unlock(inode, 1);
> > brelse(di_bh);
> > out:
>