Re: [PATCH 4/4] btrfs: Fix transaction abort during failure in del_balance_item()
From: Filipe Manana
Date: Wed Apr 09 2025 - 05:51:00 EST
On Wed, Apr 9, 2025 at 12:04 AM David Sterba <dsterba@xxxxxxx> wrote:
>
> On Tue, Apr 08, 2025 at 03:53:04PM +0100, Filipe Manana wrote:
> > On Tue, Apr 8, 2025 at 1:19 PM Yangtao Li <frank.li@xxxxxxxx> wrote:
> > >
> > > Handle errors by adding explicit btrfs_abort_transaction
> > > and btrfs_end_transaction calls.
> >
> > Again, like in the previous patch, why?
> > This provides no reason at all why we should abort.
> > And the same comment below.
> >
> > >
> > > Signed-off-by: Yangtao Li <frank.li@xxxxxxxx>
> > > ---
> > > fs/btrfs/volumes.c | 11 +++++++----
> > > 1 file changed, 7 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > > index 347c475028e0..23739d18d833 100644
> > > --- a/fs/btrfs/volumes.c
> > > +++ b/fs/btrfs/volumes.c
> > > @@ -3777,7 +3777,7 @@ static int del_balance_item(struct btrfs_fs_info *fs_info)
> > > struct btrfs_trans_handle *trans;
> > > BTRFS_PATH_AUTO_FREE(path);
> > > struct btrfs_key key;
> > > - int ret, err;
> > > + int ret;
> > >
> > > path = btrfs_alloc_path();
> > > if (!path)
> > > @@ -3800,10 +3800,13 @@ static int del_balance_item(struct btrfs_fs_info *fs_info)
> > > }
> > >
> > > ret = btrfs_del_item(trans, root, path);
> > > + if (ret)
> > > + goto out;
> > > +
> > > + return btrfs_commit_transaction(trans);
> > > out:
> > > - err = btrfs_commit_transaction(trans);
> > > - if (err && !ret)
> > > - ret = err;
> > > + btrfs_abort_transaction(trans, ret);
> > > + btrfs_end_transaction(trans);
> >
> > A transaction abort will turn the fs into RO mode, and it's meant to
> > be used when we can't proceed with changes to the fs after we did
> > partial changes, to avoid leaving things in an inconsistent state.
> > Here we don't abort because we haven't done any changes before using
> > the transaction handle, so an abort is pointless and will turn the fs
> > into RO mode unnecessarily.
>
> The del_balance_item() case seems to be unique, there's only one caller
> reset_balance_state() that calls btrfs_handle_fs_error() in case of an
> error. This is almost the same as a transaction abort, but
> del_balance_item() may be called from various contexts.
>
> The error handling may be improved here, e.g. some callers may care
> about the actual error of del_balance_item/reset_balance_state, but
> rather a hard transaction abort it could be better to handle it more
> gracefully for operations that are restartable, like return an EAGAIN.
That's just not possible in 2 cases out of 3 (btrfs_cancel_balance()
being the exception where it's possible), since we already have an
error return value to return to user space.
The btrfs_handle_fs_error() call is exaggerated and doesn't accomplish
anything as the balance item was already persisted in a transaction
committed previously.