Re: [syzbot] [btrfs?] kernel BUG in prepare_to_merge
From: Qu Wenruo
Date: Wed Aug 02 2023 - 02:54:29 EST
On 2023/8/1 23:26, Christoph Hellwig wrote:
In the meantime I've also reproduced it with just
"btrfs: fix the btrfs_get_global_root return value", but it took
a rather long time.
After wading through the code my suspicion is that before this fix
the ERR_PTR return made that for those cases btrfs_get_root_ref and
btrfs_get_fs_root_commit_root don't actually do the
btrfs_lookup_fs_root. Although that seemed unintentional as far
as I can tell it might have prevented some additional problems
with whatever syzcaller is fuzzing here. Not sure if anyone who
knows this code has any good idea where to start looking?
I'm also looking into the case, the weird part seems to be we're getting
some race between qgroup tree creation and relocation.
More rounds of syzbot testing shows it's not on-disk data corruption,
but runtime corruption lead to the invalid reloc tree key.
Normally if we're relocating tree 8 (quota tree), we should get
fs_info->quota_root, and it should not has ROOT_SHAREABLE flag, thus we
just go COW the involved quota tree block.
But by somehow, if the quota tree is created by btrfs_init_fs_root() it
would has the ROOT_SHAREABLE flag and leads to the incorrect reloc tree
creation.
My current guess is, some race like this:
Thread A | Thread B
---------------------------------+------------------------------
btrfs_quota_enable() |
| | btrfs_get_root_ref()
| | |- btrfs_get_global_root()
| | | Returned NULL
| | |- btrfs_lookup_fs_root()
| | | Returned NULL
|- btrfs_create_tree() | |
| Now quota root item is | |
| inserted | |- btrfs_read_tree_root()
| | | Got the newly inserted quota root
| | |- btrfs_init_fs_root()
| | | Set ROOT_SHAREABLE flag
By this, with a relocation and quota enabling, we create a race that we
can get a quota root with ROOT_SHAREABLE set, and lead to the problem.
Personally speaking, I don't have a particularly good idea on how to fix it.
We may skip any non-subvolume related trees in btrfs_init_fs_root(), but
that doesn't seem correct to me.
Any good ideas on this?
Thanks,
Qu