Re: [PATCH] bcachefs: zero-init move_bucket struct in bch2_copygc_get_buckets()

From: Gianfranco Trad
Date: Sat Dec 14 2024 - 20:58:47 EST


On 12/11/24 16:08, Gianfranco Trad wrote:
On 11/11/24 21:09, Kent Overstreet wrote:
On Mon, Nov 11, 2024 at 03:42:44PM +0100, Gianfranco Trad wrote:
zero-init move_bucket struct b fields in bch2_copygc_get_buckets()
to mitigate later uninit-value-use KMSAN reported bug.

Reported-by: syzbot+8689d10f1894eedf774d@xxxxxxxxxxxxxxxxxxxxxxxxx
Closes: https://syzkaller.appspot.com/bug?extid=8689d10f1894eedf774d
Tested-by: syzbot+8689d10f1894eedf774d@xxxxxxxxxxxxxxxxxxxxxxxxx
Signed-off-by: Gianfranco Trad <gianf.trad@xxxxxxxxx>
---
  fs/bcachefs/movinggc.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/bcachefs/movinggc.c b/fs/bcachefs/movinggc.c
index d658be90f737..cdc456b03bec 100644
--- a/fs/bcachefs/movinggc.c
+++ b/fs/bcachefs/movinggc.c
@@ -171,7 +171,8 @@ static int bch2_copygc_get_buckets(struct moving_context *ctxt,
                    lru_pos(BCH_LRU_FRAGMENTATION_START, 0, 0),
                    lru_pos(BCH_LRU_FRAGMENTATION_START, U64_MAX, LRU_TIME_MAX),
                    0, k, ({
-        struct move_bucket b = { .k.bucket = u64_to_bucket(k.k- >p.offset) };
+        struct move_bucket b = { 0 };
+        b.k.bucket = u64_to_bucket(k.k->p.offset);
          int ret2 = 0;

Providing any sort of initializer should cause the whole struct to be
initialized, are you and syzbot sure this is the right fix?
You are right, there's no need to initialize the whole struct.
I'm still in the process of fully understanding what reproducer is trying to do.
So far with the additional findings, b.k seems not to be initialized prior usage in repro case, therefore memset to 0 only the b.k field seems enough:

diff --git a/fs/bcachefs/movinggc.c b/fs/bcachefs/movinggc.c
index d658be90f737..515b05d26d11 100644
--- a/fs/bcachefs/movinggc.c
+++ b/fs/bcachefs/movinggc.c
@@ -171,7 +171,9 @@ static int bch2_copygc_get_buckets(struct moving_context *ctxt,
                   lru_pos(BCH_LRU_FRAGMENTATION_START, 0, 0),
                   lru_pos(BCH_LRU_FRAGMENTATION_START, U64_MAX, LRU_TIME_MAX),
                   0, k, ({
-        struct move_bucket b = { .k.bucket = u64_to_bucket(k.k- >p.offset) };
+        struct move_bucket b;
+        memset(&b.k, 0, sizeof(b.k));
+        b.k.bucket = u64_to_bucket(k.k->p.offset);
         int ret2 = 0;

         saw++;

The above patch was already tested-by syzbot[1].

Let me know if the patch looks good enough or if I should work on it more.

Thanks for your time,

[1] https://syzkaller.appspot.com/x/log.txt?x=1733b8c0580000

--Gian

Hi Kent,

I wanted to follow up on this patch. Over the last few days, I've investigated it further and observed the following that might be of help:

1- zero-initing whole b struct (as the first patch version) leads to a clean log [1].
While if trying to memset to 0 only b.k field log reports [2]:

bcachefs (da441363-bb6a-4ab9-999b-c1f40db4fee2): filesystem UUID already open
bcachefs (da441363-bb6a-4ab9-999b-c1f40db4fee2): shutdown complete
bcachefs: bch2_fs_get_tree() error: EINVAL


2- Given both versions of the patch didn't trigger the uninit issue anymore I checked whether inner fields of b.k.bucket are correctly inited, just before the bug triggers.
b.k.bucket fields seeming to look correctly inited: snapshot = 0, offset = 34, inode = 0, gen = 0.

3- The first of the 2 reproducers causes a segfault:

==9335== Invalid[ 264.802101][ T9346] read of size 1
==9335== at 0x483BC39: strnlen (vg_replace_strmem.c:426)
==9335== by 0x1098F0: netlink_query_family_id (repro.c:176)
==9335== by 0x109A51: syz_genetlink_get_family_id (repro.c:211)
==9335== by 0x10B476: execute_one (repro.c:2071)
==9335== by 0x10B1A5: loop (repro.c:745)
==9335== by 0x10B52F: main (repro.c:2088)
==9335== Address 0x0 is not stack'd, malloc'd or (recently) free'd

As of now, it seems unrelated to the root cause of the reported syzbot bug.


At this point, zero-initializing the entire struct seems to work reliably, thought I'm still trying to get the full picture on this bug.

I know you are busy, but I'd highly appreciate your thoughts on this.
I hope it might help.

[1] https://syzkaller.appspot.com/x/log.txt?x=1724b4e8580000
[2] https://syzkaller.appspot.com/x/log.txt?x=1733b8c0580000

Thanks for your time,
--Gian