Re: [PATCH] btrfs: fix balance_ctl not free properly in btrfs_balance

From: xiaoshoukui
Date: Tue Jul 25 2023 - 23:09:46 EST


> This is a similar patch to what Josef sent but not exactly the same,
> https://lore.kernel.org/linux-btrfs/9cdf58c2f045863e98a52d7f9d5102ba12b87f07.1687496547.git.josef@xxxxxxxxxxxxxx/
> Both remove balance_need_close but your version does not track the
> paused state. I haven't analyzed it closer, but it looks like you're
> missing some case. Josef's fix simplifies the error handling so we don't
> have te enumerate the errors.

yeah. I think the fix logic is similar.

> As you have a reproducer, can you please try it with this patch instead?
> It's possible that there are still some unhandled states so it would be
> good to check. Thanks.

With Josef patch my fuzz reproducer ran for three hours without tripping panic.
However, based on the test results, it was found that the fix was not complete.

The above patch only fixes the problem that the balance_ctl is not freed properly,
but does not solve the problem that the pause ioctl request returns an incorrect
value 0 to the user.

Issue a pause or cancel IOCTL request after judging that there is no pause or
cancel request on the path of __btrfs_balance to return 0, which will mislead
the user that the pause and cancel requests are successful.In fact, the balance
request has not been paused or canceled.

> while (1) {
> if ((!counting && atomic_read(&fs_info->balance_pause_req)) ||
> atomic_read(&fs_info->balance_cancel_req)) {
> ret = -ECANCELED;
> goto error;
> }
> --------------------
> ....... issue a pause or cancel req in anthoer thead
> --------------------
> return ret; --//return ret with 0

> [ 60.753212][ T4484] BTRFS info (device loop0): balance: start -f
> [ 60.754589][ T4484] BTRFS info (device loop0): balance: ended with status: 0
> /dev/vda balance successfully
> /dev/vda pause balance successfully --//should fail with invalid.

This should indicate that the pause ioctl fail with invalid request.
With my new patch,the testing result show that both the problems are fixed.

The log of my test:
> [ 109.371116][ T4449] BTRFS info (device loop0): balance: start -f
> [ 109.382745][ T4449] BTRFS info (device loop0): balance: ended with status: 0
> /dev/vda balance successfully
> Failed to pause balance /dev/vda, errno 22 --//fail with invalid.
> Failed to resume balance /dev/vda, errno 107 --//didn't trip assert panic
> close btrfs

Signed-off-by: xiaoshoukui <xiaoshoukui@xxxxxxxxxxxxx>
---
fs/btrfs/fs.h | 6 ++++++
fs/btrfs/volumes.c | 14 +++++++++-----
2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 203d2a267828..6c85279d0e76 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -92,6 +92,12 @@ enum {
* main phase. The fs_info::balance_ctl is initialized.
*/
BTRFS_FS_BALANCE_RUNNING,
+
+ /* Indicate that balance has been paused. */
+ BTRFS_FS_BALANCE_PAUSED,
+
+ /* Indicate that balance has been canceled. */
+ BTRFS_FS_BALANCE_CANCELED,

/*
* Indicate that relocation of a chunk has started, it's set per chunk
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 70d69d4b44d2..8e759e7ebdd6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4267,7 +4267,6 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
u64 num_devices;
unsigned seq;
bool reducing_redundancy;
- bool paused = false;
int i;

if (btrfs_fs_closing(fs_info) ||
@@ -4390,6 +4389,8 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
ASSERT(!test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
set_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags);
describe_balance_start_or_resume(fs_info);
+ clear_bit(BTRFS_FS_BALANCE_PAUSED, &fs_info->flags);
+ clear_bit(BTRFS_FS_BALANCE_CANCELED, &fs_info->flags);
mutex_unlock(&fs_info->balance_mutex);

ret = __btrfs_balance(fs_info);
@@ -4398,7 +4399,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) {
btrfs_info(fs_info, "balance: paused");
btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
- paused = true;
+ set_bit(BTRFS_FS_BALANCE_PAUSED, &fs_info->flags);
}
/*
* Balance can be canceled by:
@@ -4415,8 +4416,10 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
*
* So here we only check the return value to catch canceled balance.
*/
- else if (ret == -ECANCELED || ret == -EINTR)
+ else if (ret == -ECANCELED || ret == -EINTR) {
btrfs_info(fs_info, "balance: canceled");
+ set_bit(BTRFS_FS_BALANCE_CANCELED, &fs_info->flags);
+ }
else
btrfs_info(fs_info, "balance: ended with status: %d", ret);

@@ -4428,7 +4431,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
}

/* We didn't pause, we can clean everything up. */
- if (!paused) {
+ if (!test_bit(BTRFS_FS_BALANCE_PAUSED, &fs_info->flags)) {
reset_balance_state(fs_info);
btrfs_exclop_finish(fs_info);
}
@@ -4587,6 +4590,7 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
/* we are good with balance_ctl ripped off from under us */
BUG_ON(test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
atomic_dec(&fs_info->balance_pause_req);
+ ret = test_bit(BTRFS_FS_BALANCE_PAUSED, &fs_info->flags) ? 0 : -EINVAL;
} else {
ret = -ENOTCONN;
}
@@ -4642,7 +4646,7 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
atomic_dec(&fs_info->balance_cancel_req);
mutex_unlock(&fs_info->balance_mutex);
- return 0;
+ return test_bit(BTRFS_FS_BALANCE_CANCELED, &fs_info->flags) ? 0 : -EINVAL;
}

int btrfs_uuid_scan_kthread(void *data)
--
2.34.1