Re: [PATCH v6 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
From: Heming Zhao
Date: Thu Apr 02 2026 - 03:56:52 EST
On Thu, Apr 02, 2026 at 03:48:36PM +0800, Joseph Qi wrote:
>
>
> On 4/2/26 3:43 PM, Heming Zhao wrote:
> > On Thu, Apr 02, 2026 at 03:07:10PM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 4/2/26 10:26 AM, Heming Zhao wrote:
> >>> During ocfs2 dio operations, JBD2 may report warnings via following
> >>> call trace:
> >>> ocfs2_dio_end_io_write
> >>> ocfs2_mark_extent_written
> >>> ocfs2_change_extent_flag
> >>> ocfs2_split_extent
> >>> ocfs2_try_to_merge_extent
> >>> ocfs2_extend_rotate_transaction
> >>> ocfs2_extend_trans
> >>> jbd2__journal_restart
> >>> start_this_handle
> >>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> >>>
> >>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> >>> handle extents in a batch of transaction.
> >>>
> >>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> >>> only be removed from the orphan list after the extent tree update is complete.
> >>> This ensures that if a crash occurs in the middle of extent tree updates, we
> >>> won't leave stale blocks beyond EOF.
> >>>
> >>> This patch also changes the logic for updating the inode size and removing
> >>> orphan, making it similar to ext4_dio_write_end_io(). Both operations are
> >>> performed only when everything looks good.
> >>>
> >>> Finally, thanks to Jans and Joseph for providing the bug fix prototype and
> >>> suggestions.
> >>>
> >>> Suggested-by: Jan Kara <jack@xxxxxxx>
> >>> Suggested-by: Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx>
> >>> Reviewed-by: Jan Kara <jack@xxxxxxx>
> >>> Signed-off-by: Heming Zhao <heming.zhao@xxxxxxxx>
> >>> ---
> >>> fs/ocfs2/aops.c | 71 ++++++++++++++++++++++++++++++-------------------
> >>> 1 file changed, 43 insertions(+), 28 deletions(-)
> >>>
> >>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> >>> index 09146b43d1f0..0c88f751ffd5 100644
> >>> --- a/fs/ocfs2/aops.c
> >>> +++ b/fs/ocfs2/aops.c
> >>> @@ -37,6 +37,8 @@
> >>> #include "namei.h"
> >>> #include "sysfile.h"
> >>>
> >>> +#define OCFS2_DIO_MARK_EXTENT_BATCH 200
> >>> +
> >>> static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
> >>> struct buffer_head *bh_result, int create)
> >>> {
> >>> @@ -2277,7 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>> struct ocfs2_alloc_context *meta_ac = NULL;
> >>> handle_t *handle = NULL;
> >>> loff_t end = offset + bytes;
> >>> - int ret = 0, credits = 0;
> >>> + int ret = 0, credits = 0, batch = 0;
> >>>
> >>> ocfs2_init_dealloc_ctxt(&dealloc);
> >>>
> >>> @@ -2294,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>> goto out;
> >>> }
> >>>
> >>> - /* Delete orphan before acquire i_rwsem. */
> >>> - if (dwc->dw_orphaned) {
> >>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> >>> -
> >>> - end = end > i_size_read(inode) ? end : 0;
> >>> -
> >>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> >>> - !!end, end);
> >>> - if (ret < 0)
> >>> - mlog_errno(ret);
> >>> - }
> >>> -
> >>> down_write(&oi->ip_alloc_sem);
> >>> di = (struct ocfs2_dinode *)di_bh->b_data;
> >>>
> >>> @@ -2326,24 +2316,25 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>>
> >>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> >>>
> >>> - handle = ocfs2_start_trans(osb, credits);
> >>> - if (IS_ERR(handle)) {
> >>> - ret = PTR_ERR(handle);
> >>> - mlog_errno(ret);
> >>> - goto unlock;
> >>> - }
> >>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>> - OCFS2_JOURNAL_ACCESS_WRITE);
> >>> - if (ret) {
> >>> - mlog_errno(ret);
> >>> - goto commit;
> >>> - }
> >>> -
> >>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> >>> + if (!handle) {
> >>> + handle = ocfs2_start_trans(osb, credits);
> >>> + if (IS_ERR(handle)) {
> >>> + ret = PTR_ERR(handle);
> >>> + mlog_errno(ret);
> >>> + goto unlock;
> >>> + }
> >>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>> + OCFS2_JOURNAL_ACCESS_WRITE);
> >>> + if (ret) {
> >>> + mlog_errno(ret);
> >>> + goto commit;
> >>> + }
> >>> + }
> >>> ret = ocfs2_assure_trans_credits(handle, credits);
> >>> if (ret < 0) {
> >>> mlog_errno(ret);
> >>> - break;
> >>> + goto commit;
> >>> }
> >>> ret = ocfs2_mark_extent_written(inode, &et, handle,
> >>> ue->ue_cpos, 1,
> >>> @@ -2351,19 +2342,43 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>> meta_ac, &dealloc);
> >>> if (ret < 0) {
> >>> mlog_errno(ret);
> >>> - break;
> >>> + goto commit;
> >>> + }
> >>> +
> >>> + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
> >>> + ocfs2_commit_trans(osb, handle);
> >>> + handle = NULL;
> >>> + batch = 0;
> >>> }
> >>> }
> >>>
> >>> if (end > i_size_read(inode)) {
> >>> + if (!handle) {
> >>> + handle = ocfs2_start_trans(osb, credits);
> >>> + if (IS_ERR(handle)) {
> >>> + ret = PTR_ERR(handle);
> >>> + mlog_errno(ret);
> >>> + goto unlock;
> >>> + }
> >>> + }
> >>> ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
> >>> if (ret < 0)
> >>> mlog_errno(ret);
> >>> }
> >>> +
> >>> commit:
> >>> ocfs2_commit_trans(osb, handle);
> >>
> >> Seems checking 'handle' is still missing here.
> >>
> >> Joseph
> >
> > I have a different view.
> > In the list_for_each_entry(), if it finishes because the condition
> > (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) is met, the handle will be null.
> > However, it will be initialized later within the i_size update block.
> > In my view, the handle must be non-null before entering ocfs2_commit_trans().
> >
>
> Ummm... inode size update is conditional, right?
>
> Thanks,
> Joseph
>
How stupid of me! understand your point now.
- Heming