Re: [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion

From: Heming Zhao

Date: Thu Mar 26 2026 - 23:02:47 EST


On Fri, Mar 27, 2026 at 09:42:55AM +0800, Joseph Qi wrote:
> Hi,
>
> On 3/26/26 10:26 PM, Heming Zhao wrote:
> > During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> > ocfs2_dio_end_io_write
> > ocfs2_mark_extent_written
> > ocfs2_change_extent_flag
> > ocfs2_split_extent
> > ocfs2_try_to_merge_extent
> > ocfs2_extend_rotate_transaction
> > ocfs2_extend_trans
> > jbd2__journal_restart
> > start_this_handle
> > output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> >
> > To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write()
> > to handle extent in a batch of transaction.
> >
> > Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> > only be removed from the orphan list after the extent tree update is complete.
> > this ensures that if a crash occurs in the middle of extent tree updates, we
> > won't leave stale blocks beyond EOF.
> >
> > Finally, thanks to Jans and Joseph for providing the bug fix prototype and
> > suggestions.
> >
> > Suggested-by: Jan Kara <jack@xxxxxxx>
> > Suggested-by: Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx>
> > Reviewed-by: Jan Kara <jack@xxxxxxx>
> > Signed-off-by: Heming Zhao <heming.zhao@xxxxxxxx>
> > ---
> > fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++-------------------
> > 1 file changed, 44 insertions(+), 28 deletions(-)
> >
> > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> > index 09146b43d1f0..60f1b607022f 100644
> > --- a/fs/ocfs2/aops.c
> > +++ b/fs/ocfs2/aops.c
> > @@ -37,6 +37,8 @@
> > #include "namei.h"
> > ... ... <--- snip --->
> > + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
> > + ocfs2_commit_trans(osb, handle);
> > + handle = NULL;
> > + batch = 0;
> > + }
> > }
> >
> > if (end > i_size_read(inode)) {
>
> I still don't think it is a good idea to update inode size in case error.
> The original logic behaves inconsistent, if ocfs2_start_trans() and
> ocfs2_journal_access_di() fails, it won't update inode size, but if
> ocfs2_assure_trans_credits() and ocfs2_mark_extent_written(), it will do.
> So let's make it behave consistent by both checking 'ret' here.
>
> Other looks fine.
>
> Joseph

After sending the v4 patch, I did some tests and identified the key of the
performance degradation in the dynamic credit adjustment patch.
I found that by using half of the jbd2_max value, the time consumption returned
to the same level as the "direct loop" style.

key code change:
(base on patch[1]: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch)
int jbd2_max = (journal->j_max_transaction_buffers -
journal->j_transaction_overhead_buffers) >> 2;

my test env:
- 10G ocfs2 volume
- DIO 2G file, write with 64MB block (--bs=64M).

observations:
- ocfs2_assure_trans_credits calls "commit & start trans" times on every 64MB block:
- using jbd2_max (5449): ~9 times.
- using half value (2724): ~20 times.
- v4 patch ("batch" style), call "commit & start trans" on every 64MB block: ~62 times

time consumption:

jbd2_max:5449 case
real 2m9.346s
user 0m0.277s
sys 0m22.748s

half jbd2_max:2724 case
real 1m49.400s
user 0m0.343s
sys 0m22.734s

v4 patch:
real 1m49.650s
user 0m0.337s
sys 0m22.780s


the reason (I guess):
The original patch only performed a "commit and start new trans" when jbd2
credits were nearly exhausted. I suspect contention between the commit flush
operation and the jbd2 operations in the dio end path, leading to high latency.
By using half the jbd2_max value, trigger commits more frequently, which appears
to reduce jbd2 contention and racing.

Finally, I suggest we use the dynamic credit extension patch for v5. This
version only adds logic to ocfs2_assure_trans_credits() and does not affect
i_size changes during error cases.

[1]:
searching "0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch"
in following url:
- https://lore.kernel.org/ocfs2-devel/75f89a17-213b-42a0-a30e-d52fb2d077a6@xxxxxxxxxxxxxxxxx/T/#mbe2b5f52ee249178e1ad4c76d964de2dc818eb32

-----
For easy of discussion, I past the full dynamic credits adjustment patch
(with half jbd2_max) below.

```
Subject: [PATCH] ocfs2: dynamic extending jbd2 credits during dio end path

---
fs/ocfs2/aops.c | 31 +++++++++++++++++--------------
fs/ocfs2/journal.c | 34 ++++++++++++++++++++++++++++++----
fs/ocfs2/journal.h | 5 +++--
3 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7a65d5a36a3e..9cdecfa0ea0c 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2279,6 +2279,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
handle_t *handle = NULL;
loff_t end = offset + bytes;
int ret = 0, credits = 0;
+ /* do the same job of jbd2_max_user_trans_buffers() */
+ journal_t *journal = osb->journal->j_journal;
+ int jbd2_max = (journal->j_max_transaction_buffers -
+ journal->j_transaction_overhead_buffers) >> 2;

ocfs2_init_dealloc_ctxt(&dealloc);

@@ -2295,18 +2299,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
goto out;
}

- /* Delete orphan before acquire i_rwsem. */
- if (dwc->dw_orphaned) {
- BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
- end = end > i_size_read(inode) ? end : 0;
-
- ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
- !!end, end);
- if (ret < 0)
- mlog_errno(ret);
- }
-
down_write(&oi->ip_alloc_sem);
di = (struct ocfs2_dinode *)di_bh->b_data;

@@ -2341,8 +2333,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
}

list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
- ret = ocfs2_assure_trans_credits(handle, credits);
- if (ret < 0) {
+ ret = ocfs2_assure_trans_credits(inode, &handle, di_bh, credits, jbd2_max);
+ if (ret == 1) {
+ goto unlock;
+ } else if (ret < 0) {
mlog_errno(ret);
break;
}
@@ -2365,6 +2359,15 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
ocfs2_commit_trans(osb, handle);
unlock:
up_write(&oi->ip_alloc_sem);
+
+ if (dwc->dw_orphaned) {
+ BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+ ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+ if (ret < 0)
+ mlog_errno(ret);
+ }
+
ocfs2_inode_unlock(inode, 1);
brelse(di_bh);
out:
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index e5f58ff2175f..57ad69d19494 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -474,14 +474,40 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks)
* credits. Similar notes regarding data consistency and locking implications
* as for ocfs2_extend_trans() apply here.
*/
-int ocfs2_assure_trans_credits(handle_t *handle, int nblocks)
+int ocfs2_assure_trans_credits(struct inode *inode, handle_t **i_handle,
+ struct buffer_head *bh, int nblocks, int jbd2_max)
{
+ handle_t *handle = *i_handle;
int old_nblks = jbd2_handle_buffer_credits(handle);
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ int ret, wanted;
+ int plan = nblocks << 1; /* Double the credits to prevent boundary issues */

trace_ocfs2_assure_trans_credits(old_nblks);
- if (old_nblks >= nblocks)
- return 0;
- return ocfs2_extend_trans(handle, nblocks - old_nblks);
+
+ wanted = old_nblks + plan;
+
+ if (wanted < jbd2_max) {
+ if (old_nblks < nblocks)
+ return ocfs2_extend_trans(handle, nblocks - old_nblks);
+ else
+ return 0;
+ }
+
+ ocfs2_commit_trans(osb, handle);
+ handle = ocfs2_start_trans(osb, plan);
+ if (IS_ERR(handle)) {
+ mlog_errno(PTR_ERR(handle));
+ return 1; /* caller must not commit trans for this error */
+ }
+
+ ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret)
+ mlog_errno(ret);
+
+ *i_handle = handle;
+ return ret;
}

/*
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 6397170f302f..08a76ffb870a 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -244,8 +244,9 @@ handle_t *ocfs2_start_trans(struct ocfs2_super *osb,
int ocfs2_commit_trans(struct ocfs2_super *osb,
handle_t *handle);
int ocfs2_extend_trans(handle_t *handle, int nblocks);
-int ocfs2_assure_trans_credits(handle_t *handle,
- int nblocks);
+int ocfs2_assure_trans_credits(struct inode *inode,
+ handle_t **i_handle, struct buffer_head *bh,
+ int nblocks, int jbd2_max);
int ocfs2_allocate_extend_trans(handle_t *handle,
int thresh);

--
2.43.0

```

- Heming