Re: [PATCH] btrfs: Added null check to extent_root variable

From: Qu Wenruo
Date: Wed Sep 04 2024 - 17:25:58 EST




在 2024/9/5 01:20, Ghanshyam Agrawal 写道:
On Wed, Sep 4, 2024 at 11:21 AM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:



在 2024/9/4 12:07, Ghanshyam Agrawal 写道:
Reported-by: syzbot+9c3e0cdfbfe351b0bc0e@xxxxxxxxxxxxxxxxxxxxxxxxx
Closes:https://syzkaller.appspot.com/bug?extid=9c3e0cdfbfe351b0bc0e
Signed-off-by: Ghanshyam Agrawal <ghanshyam1898@xxxxxxxxx>
---
fs/btrfs/ref-verify.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/ref-verify.c b/fs/btrfs/ref-verify.c
index 9522a8b79d22..4e98ddf5e8df 100644
--- a/fs/btrfs/ref-verify.c
+++ b/fs/btrfs/ref-verify.c
@@ -1002,6 +1002,9 @@ int btrfs_build_ref_tree(struct btrfs_fs_info *fs_info)
return -ENOMEM;

extent_root = btrfs_extent_root(fs_info, 0);
+ if (!extent_root)
+ return -EIO;
+

Can you reproduce the original bug and are sure it's an NULL extent tree
causing the problem?

The original bug can be reproduced using the c program provided by syzkaller
https://syzkaller.appspot.com/bug?extid=9c3e0cdfbfe351b0bc0e


At least a quick glance into the console output shows there is no
special handling like rescue=ibadroots to ignore extent root, nor any
obvious corruption in the extent tree.

I don't have a deep knowledge of filesystems and am unable to
understand this. If you believe I should try to understand this part
for working on this particular bug, please let me know. I will try to
understand this.

If extent root is really empty, we should error out way earlier.

Upon studying the function call sequence, I found out that the
variable extent_root is read for the first time at btrfs_root_node
in fs/btrfs/ctree.c, "eb = rcu_dereference(root->node);" where "root"
is the extent_root. This is where we get a null pointer error.

The problem is, extent_root is very commonly utilized.

For example, inside btrfs_read_block_groups() we call
btrfs_block_group_root(), which calls btrfs_extent_root(fs_info, 0) for
btrfs without block group tree case.

Surprisingly, this time the C reproducer works for me (meanwhile it
never worked before).


In fact, I added extra debugging when mounting the fs, showing that the
loopback device does not have the new block group tree feature, thus
it's still a mystery why we didn't get any earlier crash at
btrfs_read_block_groups().

So I'll keep digging and give you a more comprehensive reason on why
this is triggered (without crashing at an earlier location).

Thanks,
Qu


Mind to explain the crash with more details?
I have answered your questions individually. If any other details are
needed, please let me know.

Thanks,
Qu

eb = btrfs_read_lock_root_node(extent_root);
level = btrfs_header_level(eb);
path->nodes[level] = eb;

Thank you very much for reviewing my patch. I look forward to hearing
back from you.

Thanks & Regards,
Ghanshyam Agrawal