Re: [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion

From: Joseph Qi

Date: Thu Mar 26 2026 - 21:48:41 EST


Hi,

On 3/26/26 10:26 PM, Heming Zhao wrote:
> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> ocfs2_dio_end_io_write
> ocfs2_mark_extent_written
> ocfs2_change_extent_flag
> ocfs2_split_extent
> ocfs2_try_to_merge_extent
> ocfs2_extend_rotate_transaction
> ocfs2_extend_trans
> jbd2__journal_restart
> start_this_handle
> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>
> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write()
> to handle extent in a batch of transaction.
>
> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> only be removed from the orphan list after the extent tree update is complete.
> this ensures that if a crash occurs in the middle of extent tree updates, we
> won't leave stale blocks beyond EOF.
>
> Finally, thanks to Jans and Joseph for providing the bug fix prototype and
> suggestions.
>
> Suggested-by: Jan Kara <jack@xxxxxxx>
> Suggested-by: Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx>
> Reviewed-by: Jan Kara <jack@xxxxxxx>
> Signed-off-by: Heming Zhao <heming.zhao@xxxxxxxx>
> ---
> fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++-------------------
> 1 file changed, 44 insertions(+), 28 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 09146b43d1f0..60f1b607022f 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -37,6 +37,8 @@
> #include "namei.h"
> #include "sysfile.h"
>
> +#define OCFS2_DIO_MARK_EXTENT_BATCH 200
> +
> static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
> struct buffer_head *bh_result, int create)
> {
> @@ -2277,7 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> struct ocfs2_alloc_context *meta_ac = NULL;
> handle_t *handle = NULL;
> loff_t end = offset + bytes;
> - int ret = 0, credits = 0;
> + int ret = 0, credits = 0, batch = 0;
>
> ocfs2_init_dealloc_ctxt(&dealloc);
>
> @@ -2294,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> goto out;
> }
>
> - /* Delete orphan before acquire i_rwsem. */
> - if (dwc->dw_orphaned) {
> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> -
> - end = end > i_size_read(inode) ? end : 0;
> -
> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> - !!end, end);
> - if (ret < 0)
> - mlog_errno(ret);
> - }
> -
> down_write(&oi->ip_alloc_sem);
> di = (struct ocfs2_dinode *)di_bh->b_data;
>
> @@ -2326,20 +2316,22 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>
> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>
> - handle = ocfs2_start_trans(osb, credits);
> - if (IS_ERR(handle)) {
> - ret = PTR_ERR(handle);
> - mlog_errno(ret);
> - goto unlock;
> - }
> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> - OCFS2_JOURNAL_ACCESS_WRITE);
> - if (ret) {
> - mlog_errno(ret);
> - goto commit;
> - }
> -
> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> + if (!handle) {
> + handle = ocfs2_start_trans(osb, credits);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> + mlog_errno(ret);
> + handle = NULL;
> + break;
> + }
> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> + OCFS2_JOURNAL_ACCESS_WRITE);
> + if (ret) {
> + mlog_errno(ret);
> + break;
> + }
> + }
> ret = ocfs2_assure_trans_credits(handle, credits);
> if (ret < 0) {
> mlog_errno(ret);
> @@ -2353,17 +2345,41 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> mlog_errno(ret);
> break;
> }
> +
> + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
> + ocfs2_commit_trans(osb, handle);
> + handle = NULL;
> + batch = 0;
> + }
> }
>
> if (end > i_size_read(inode)) {

I still don't think it is a good idea to update inode size in case error.
The original logic behaves inconsistent, if ocfs2_start_trans() and
ocfs2_journal_access_di() fails, it won't update inode size, but if
ocfs2_assure_trans_credits() and ocfs2_mark_extent_written(), it will do.
So let's make it behave consistent by both checking 'ret' here.

Other looks fine.

Joseph


> + if (!handle) {
> + handle = ocfs2_start_trans(osb, credits);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> + mlog_errno(ret);
> + goto unlock;
> + }
> + }
> ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
> if (ret < 0)
> mlog_errno(ret);
> }
> -commit:> - ocfs2_commit_trans(osb, handle);
> + if (handle)
> + ocfs2_commit_trans(osb, handle);
> +
> unlock:
> up_write(&oi->ip_alloc_sem);
> +
> + /* everything looks good, let's start the cleanup */
> + if (dwc->dw_orphaned) {
> + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> +
> + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> + if (ret < 0)
> + mlog_errno(ret);
> + }
> ocfs2_inode_unlock(inode, 1);
> brelse(di_bh);
> out: