Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
From: Joseph Qi
Date: Fri Mar 13 2026 - 00:53:55 EST
On 3/13/26 11:17 AM, Heming Zhao wrote:
> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
>> Almost looks fine, minor updates below.
>>
>> On 3/13/26 12:27 AM, Heming Zhao wrote:
>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>> ocfs2_dio_end_io_write
>>> ocfs2_mark_extent_written
>>> ocfs2_change_extent_flag
>>> ocfs2_split_extent
>>> ocfs2_try_to_merge_extent
>>> ocfs2_extend_rotate_transaction
>>> ocfs2_extend_trans
>>> jbd2__journal_restart
>>> start_this_handle
>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>
>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
>>> handle each extent in a separate transaction.
>>>
>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>> only be removed from the orphan list after the extent tree update is complete.
>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>> won't leave stale blocks beyond EOF.
>>>
>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
>>> insufficient transaction credits").
>>>
>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>>>
>>> Suggested-by: Jan Kara <jack@xxxxxxx>
>>> Signed-off-by: Heming Zhao <heming.zhao@xxxxxxxx>
>>> ---
>>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>>> 1 file changed, 29 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index 09146b43d1f0..91997b330d39 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>> goto out;
>>> }
>>>
>>> - /* Delete orphan before acquire i_rwsem. */
>>> - if (dwc->dw_orphaned) {
>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
>>> -
>>> - end = end > i_size_read(inode) ? end : 0;
>>> -
>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
>>> - !!end, end);
>>> - if (ret < 0)
>>> - mlog_errno(ret);
>>> - }
>>> -
>>> down_write(&oi->ip_alloc_sem);
>>> di = (struct ocfs2_dinode *)di_bh->b_data;
>>>
>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>
>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>>>
>>> - handle = ocfs2_start_trans(osb, credits);
>>> - if (IS_ERR(handle)) {
>>> - ret = PTR_ERR(handle);
>>> - mlog_errno(ret);
>>> - goto unlock;
>>> - }
>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>> - OCFS2_JOURNAL_ACCESS_WRITE);
>>> - if (ret) {
>>> - mlog_errno(ret);
>>> - goto commit;
>>> - }
>>> -
>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
>>> - ret = ocfs2_assure_trans_credits(handle, credits);
>>> - if (ret < 0) {
>>> + handle = ocfs2_start_trans(osb, credits);
>>> + if (IS_ERR(handle)) {
>>> + ret = PTR_ERR(handle);
>>> mlog_errno(ret);
>>> break;
>>
>> I'd rather goto unlock directly without update i_size in case error.
>
> agree
>>
>>> }
>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>> + OCFS2_JOURNAL_ACCESS_WRITE);
>>> + if (ret) {
>>> + mlog_errno(ret);
>>> + ocfs2_commit_trans(osb, handle);
>>> + break;
>>
>> Ditto.
>
> agree
>>
>>> + }
>>> ret = ocfs2_mark_extent_written(inode, &et, handle,
>>> ue->ue_cpos, 1,
>>> ue->ue_phys,
>>> meta_ac, &dealloc);
>>> if (ret < 0) {
>>> mlog_errno(ret);
>>> + ocfs2_commit_trans(osb, handle);
>>> break;
>>
>> Ditto.
>
> The existing code still updates i_size even if ocfs2_mark_extent_written()
> returns an error. I am not certain whether updating i_size in this case is
> correct, but I prefer to maintain the original logic for now.
> Does that seem reasonable to you?
>
Since it returns 0 for unwritten extents, it looks fine.
>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
>> to save the extra call in case error.
>
> agree
>
> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
> is about 12107 ~ 12457.
> Regarding the transaction splitting in the unwritten handling loop: this patch
> introduces a minor performance regression. I would like to merge some of the
> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
> What do you think of this approach?
>
Seems we can limit max credit for a single transaction here?
If exceeds, restart a new one.
Joseph,
Thanks