Re: [PATCH] bcachefs: WARNING in bch2_trans_srcu_unlock

From: Camila Alvarez Inostroza
Date: Tue Aug 06 2024 - 23:04:54 EST




On Fri, 2 Aug 2024, Kent Overstreet wrote:

On Tue, Jul 23, 2024 at 09:08:00PM GMT, Camila Alvarez wrote:
bch2_extent_fallocate should not return -BCH_ERR_transaction_restart_nested when there are no buckets available.
Returning -BCH_ERR_transaction_restart_nested causes the sector allocation to retry even if there's no space available.

It seems like you're saying this fixes a livelock, but I don't see that:
if we return operation_blocked, the closure_sync we do before we return
will still wait.

More explanation?

Yes, sorry for the delay. The closure_sync will still wait, but even after waiting there's no space left.
I think the issue is that we're trying to allocate a bucket when none are available:
closure_wait(&c->freelist_wait, cl);

journal_write_done is called at some point:
bch2_do_discard(c)
closure_wake_up(&c->freelist_wait)
...

That apparently doesn't free any buckets (should we add some sort of assertion to notify us when this happens?), so there is still no space left, so bch2_bucket_alloc_trans keeps returning -BCH_ERR_freelist_empty.
With this bch2_alloc_sectors_start_trans return -BCH_err_bucket_alloc_blocked, which bch2_extent_fallocate turns into -BCH_ERR_transaction_restart_nested.

Finally __bchfs_fallocate sees the error as BCH_ERR_transaction_restart, so it tries again.
The issue is that the journal never discards any buckets so it just keeps retrying indefinitely.

I've been thinking, and maybe we don't want to just keep __bchfs_fallocate from retrying, but would prefer to control how many times it does.

Does all of this make any sense to you?



Reported-by: syzbot+1e515cab343dbe5aa38a@xxxxxxxxxxxxxxxxxxxxxxxxx
Signed-off-by: Camila Alvarez <cam.alvarez.i@xxxxxxxxx>
---
fs/bcachefs/io_misc.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/fs/bcachefs/io_misc.c b/fs/bcachefs/io_misc.c
index 82f9170dab3f..b5b92ef030df 100644
--- a/fs/bcachefs/io_misc.c
+++ b/fs/bcachefs/io_misc.c
@@ -90,8 +90,6 @@ int bch2_extent_fallocate(struct btree_trans *trans,
opts.data_replicas,
opts.data_replicas,
BCH_WATERMARK_normal, 0, &cl, &wp);
- if (bch2_err_matches(ret, BCH_ERR_operation_blocked))
- ret = -BCH_ERR_transaction_restart_nested;
if (ret)
goto err;

--
2.34.1