Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion

From: Joseph Qi

Date: Fri Mar 13 2026 - 01:35:33 EST




On 3/13/26 12:53 PM, Joseph Qi wrote:
>
>
> On 3/13/26 11:17 AM, Heming Zhao wrote:
>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
>>> Almost looks fine, minor updates below.
>>>
>>> On 3/13/26 12:27 AM, Heming Zhao wrote:
>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>>> ocfs2_dio_end_io_write
>>>> ocfs2_mark_extent_written
>>>> ocfs2_change_extent_flag
>>>> ocfs2_split_extent
>>>> ocfs2_try_to_merge_extent
>>>> ocfs2_extend_rotate_transaction
>>>> ocfs2_extend_trans
>>>> jbd2__journal_restart
>>>> start_this_handle
>>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>>
>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
>>>> handle each extent in a separate transaction.
>>>>
>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>>> only be removed from the orphan list after the extent tree update is complete.
>>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>>> won't leave stale blocks beyond EOF.
>>>>
>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
>>>> insufficient transaction credits").
>>>>
>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>>>>
>>>> Suggested-by: Jan Kara <jack@xxxxxxx>
>>>> Signed-off-by: Heming Zhao <heming.zhao@xxxxxxxx>
>>>> ---
>>>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>>>> 1 file changed, 29 insertions(+), 29 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>> index 09146b43d1f0..91997b330d39 100644
>>>> --- a/fs/ocfs2/aops.c
>>>> +++ b/fs/ocfs2/aops.c
>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>> goto out;
>>>> }
>>>>
>>>> - /* Delete orphan before acquire i_rwsem. */
>>>> - if (dwc->dw_orphaned) {
>>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
>>>> -
>>>> - end = end > i_size_read(inode) ? end : 0;
>>>> -
>>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
>>>> - !!end, end);
>>>> - if (ret < 0)
>>>> - mlog_errno(ret);
>>>> - }
>>>> -
>>>> down_write(&oi->ip_alloc_sem);
>>>> di = (struct ocfs2_dinode *)di_bh->b_data;
>>>>
>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>
>>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>>>>
>>>> - handle = ocfs2_start_trans(osb, credits);
>>>> - if (IS_ERR(handle)) {
>>>> - ret = PTR_ERR(handle);
>>>> - mlog_errno(ret);
>>>> - goto unlock;
>>>> - }
>>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>> - OCFS2_JOURNAL_ACCESS_WRITE);
>>>> - if (ret) {
>>>> - mlog_errno(ret);
>>>> - goto commit;
>>>> - }
>>>> -
>>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
>>>> - ret = ocfs2_assure_trans_credits(handle, credits);
>>>> - if (ret < 0) {
>>>> + handle = ocfs2_start_trans(osb, credits);
>>>> + if (IS_ERR(handle)) {
>>>> + ret = PTR_ERR(handle);
>>>> mlog_errno(ret);
>>>> break;
>>>
>>> I'd rather goto unlock directly without update i_size in case error.
>>
>> agree
>>>
>>>> }
>>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>> + OCFS2_JOURNAL_ACCESS_WRITE);
>>>> + if (ret) {
>>>> + mlog_errno(ret);
>>>> + ocfs2_commit_trans(osb, handle);
>>>> + break;
>>>
>>> Ditto.
>>
>> agree
>>>
>>>> + }
>>>> ret = ocfs2_mark_extent_written(inode, &et, handle,
>>>> ue->ue_cpos, 1,
>>>> ue->ue_phys,
>>>> meta_ac, &dealloc);
>>>> if (ret < 0) {
>>>> mlog_errno(ret);
>>>> + ocfs2_commit_trans(osb, handle);
>>>> break;
>>>
>>> Ditto.
>>
>> The existing code still updates i_size even if ocfs2_mark_extent_written()
>> returns an error. I am not certain whether updating i_size in this case is
>> correct, but I prefer to maintain the original logic for now.
>> Does that seem reasonable to you?
>>
>
> Since it returns 0 for unwritten extents, it looks fine.
>
Think it a bit more, I think it would be more acceptable to behave
consistent here.
e.g. succeeds in the first round and fails to start transaction in the
second, now it won't update i_size.

Joseph