Re: KMSAN: uninit-value in memcmp (2)
From: Vladis Dronov
Date: Thu Sep 27 2018 - 07:17:34 EST
Hello, Dmirty,
Thank you for the explanation of how syzkaller/syzbot works in this and
other emails. I understand that is it a complicated task to determine
and categorize bugs based on just crash dump and messages, and syzkaller
does a great job of doing so.
> Re __hw_addr_add_ex bug, as Alex noted the crash was detected _after_
> the fixing commit went in. So it's something new and different and
> can't be fixed by the older commit.
Indeed, you're right, there is another issue with tun/tap devices which
leads to this bug. I've posted a patch (https://lkml.org/lkml/2018/9/26/416)
to fix it.
I hope I did not do much damage, reporting previous fix as a fix for this bug,
as syzkaller will probably create another "KMSAN: uninit-value in <...>"
report.
Best regards,
Vladis Dronov | Red Hat, Inc. | Product Security Engineer
----- Original Message -----
> From: "Dmitry Vyukov" <dvyukov@xxxxxxxxxx>
> To: "Alexander Potapenko" <glider@xxxxxxxxxx>
> Cc: "Vladis Dronov" <vdronov@xxxxxxxxxx>, "syzbot" <syzbot+d3402c47f680ff24b29c@xxxxxxxxxxxxxxxxxxxxxxxxx>,
> "syzkaller-bugs" <syzkaller-bugs@xxxxxxxxxxxxxxxx>, "David Miller" <davem@xxxxxxxxxxxxx>, "Eric Dumazet"
> <edumazet@xxxxxxxxxx>, "LKML" <linux-kernel@xxxxxxxxxxxxxxx>, "Networking" <netdev@xxxxxxxxxxxxxxx>, "sunlianwen"
> <sunlw.fnst@xxxxxxxxxxxxxx>
> Sent: Monday, September 24, 2018 11:39:08 AM
> Subject: Re: KMSAN: uninit-value in memcmp (2)
>
> On Mon, Sep 24, 2018 at 8:53 AM, Alexander Potapenko <glider@xxxxxxxxxx>
> wrote:
> > On Mon, Sep 24, 2018 at 12:09 AM Vladis Dronov <vdronov@xxxxxxxxxx> wrote:
> >>
> >> Hello, Dmirty,
> >>
> >> Thank you for the reply. Can we please, discuss this further?
> > Hi Vladis,
> >> > You can see on dashboard that the last crash
> >> > for the second version (2) happened just few days ago. So this is a
> >> > different bug.
> > FWIW I've just double-checked that the reproducer provided by
> > syzkaller in the original message still triggers the report from the
> > original message in the latest KMSAN tree (which already contains the
> > __hw_addr_add_ex() fix from April).
> >> Well... yes and no. When I was looking at this bug (bug?id=088efeac32fd) I
> >> was looking
> >> at the report at "2018/05/09 18:55"
> >> (https://syzkaller.appspot.com/text?tag=CrashReport&x=141b707b800000),
> >> since it was the only report with a reproducer. This was my error.
> >>
> >> The error and the call trace in this report are:
> >>
> >> >>>
> >> BUG: KMSAN: uninit-value in memcmp+0x119/0x180 lib/string.c:861
> >> CPU: 0 PID: 38 Comm: kworker/0:1 Not tainted 4.17.0-rc3+ #88
> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >> Google 01/01/2011
> >> Workqueue: ipv6_addrconf addrconf_dad_work
> >> Call Trace:
> >> __dump_stack lib/dump_stack.c:77 [inline]
> >> dump_stack+0x185/0x1d0 lib/dump_stack.c:113
> >> kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
> >> __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
> >> memcmp+0x119/0x180 lib/string.c:861
> >> __hw_addr_add_ex net/core/dev_addr_lists.c:61 [inline]
> >> __dev_mc_add+0x1fc/0x900 net/core/dev_addr_lists.c:670
> >> dev_mc_add+0x6d/0x80 net/core/dev_addr_lists.c:687
> >> igmp6_group_added+0x2db/0xa00 net/ipv6/mcast.c:662
> >> ipv6_dev_mc_inc+0xe9e/0x1130 net/ipv6/mcast.c:914
> >> addrconf_join_solict net/ipv6/addrconf.c:2103 [inline]
> >> addrconf_dad_begin net/ipv6/addrconf.c:3853 [inline]
> >> addrconf_dad_work+0x462/0x2a20 net/ipv6/addrconf.c:3979
> >> process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
> >> worker_thread+0x113c/0x24f0 kernel/workqueue.c:2279
> >> kthread+0x539/0x720 kernel/kthread.c:239
> >> ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412
> >>
> >> Local variable description: ----buf@igmp6_group_added
> >> Variable was created at:
> >> igmp6_group_added+0x4a/0xa00 net/ipv6/mcast.c:650
> >> ipv6_dev_mc_inc+0xe9e/0x1130 net/ipv6/mcast.c:914
> >> <<<
> >>
> >> It is the same like in bug?id=3887c0d99aecb27d085180c5222d245d08a30806
> >> which, after some more test, made me believe these bugs are duplicate
> >> and are fixed by the same commit.
> >>
> >> But let's look at another report at "2018/09/12 21:00"
> >> (https://syzkaller.appspot.com/text?tag=CrashReport&x=14f99b71400000)
> >> at the bug (bug?id=088efeac32fd), the one you've mentioned as
> >> "the last crash for the second version (2) happened just few days ago".
> >>
> >> Its error and the call trace are completely different:
> >>
> >> >>>
> >> BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863
> >> CPU: 0 PID: 6107 Comm: syz-executor4 Not tainted 4.19.0-rc3+ #45
> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >> Google 01/01/2011
> >> Call Trace:
> >> __dump_stack lib/dump_stack.c:77 [inline]
> >> dump_stack+0x14b/0x190 lib/dump_stack.c:113
> >> kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956
> >> __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645
> >> memcmp+0x11d/0x180 lib/string.c:863
> >> dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464
> >> ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline]
> >> rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558
> >> rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715
> >> netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454
> >> rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733
> >> netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
> >> netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343
> >> netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908
> >> sock_sendmsg_nosec net/socket.c:621 [inline]
> >> sock_sendmsg net/socket.c:631 [inline]
> >> ...
> >> Uninit was created at:
> >> ...
> >> slab_post_alloc_hook mm/slab.h:446 [inline]
> >> slab_alloc_node mm/slub.c:2718 [inline]
> >> __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351
> >> __kmalloc_reserve net/core/skbuff.c:138 [inline]
> >> __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206
> >> alloc_skb include/linux/skbuff.h:996 [inline]
> >> netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
> >> netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883
> >> sock_sendmsg_nosec net/socket.c:621 [inline]
> >> sock_sendmsg net/socket.c:631 [inline]
> >> ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
> >> <<<
> >>
> >> This is a different bug. How come these 2 different reports for 2
> >> different
> >> bugs have ended in the same syzkaller report (bug?id=088efeac32fd) ?
> >
> > I suspect this is because syzbot used the top stack frame as the
> > report signature.
> > There's a mechanism to ignore frames like memcmp() in the reports, not
> > sure why didn't it work in this case (maybe it just wasn't in place at
> > the time the report happened).
> >> One bug is fixed by the "net: fix uninit-value in __hw_addr_add_ex()"
> >> commit,
> >> the second one is not, but they are still in the same syzkaller report.
> >>
> >> This was the reason of my confusion. I'm not sure how to fix this. If it
> >> is possible,
> >> probably we need to cancel/revoke "#syz fix: net: fix uninit-value in
> >> __hw_addr_add_ex()"
> >> for this syzkaller report (bug?id=088efeac32fd). And then "split" it into
> >> 2 or
> >> more different reports, but I'm not sure if this is possible.
> >>
> >> Probably, syzkaller needs to look deeper into the KMSAN reports to
> >> differentiate
> >> KMSAN errors happening because of different reasons.
> >>
> >> > On Sun, Sep 23, 2018 at 6:02 PM, Vladis Dronov <vdronov@xxxxxxxxxx>
> >> > wrote:
> >> > > #syz fix: net: fix uninit-value in __hw_addr_add_ex()
> >> >
> >> > Hi Vladis,
> >> >
> >> > This can be fixed with "net: fix uninit-value in __hw_addr_add_ex()".
> >> > That commit landed in April, syzbot waited till the commit reached all
> >> > tested trees, and then closed the bug.
> >> > But the similar bug continued to happen, so syzbot created second
> >> > version of this bug (2). You can see on dashboard that the last crash
> >> > for the second version (2) happened just few days ago. So this is a
> >> > different bug.
>
>
> Precisely discriminating bugs (root causes) bases on crash text is
> generally undecidable problem, even for humans. We even can have
> literally equal crash texts, which are still different bugs. And we
> can have significantly differently looking crash texts, which are
> actually caused by the same root cause. syzbot extracts some
> "identity" string for each crash and than uses that string to
> discriminate crashes and sort them into bins. This identity string is
> what you see in email subject and bug title on dashboard. This method
> can have both false positives and false negatives, but works
> reasonably well in most cases and looks like the best practical
> option.
>
> For this exact instance (memcmp) we actually improved the analysis
> logic recently:
> https://github.com/google/syzkaller/commit/0e29942f77715486995d996f80f82742812d75a2#diff-abe1515f011fad2659ff218f9eea9ae1
> But this crash was analyzed and reported before the change. So if this
> crash happens again it should be reported as "in __hw_addr_add_ex"
> now.
>
> Re __hw_addr_add_ex bug, as Alex noted the crash was detected _after_
> the fixing commit went in. So it's something new and different and
> can't be fixed by the older commit.
>
> There are no general, single guideline as to what to do when several
> different bugs glued together into a single bug. Fixing at least one
> of them (any) in the context of the bug is good, fixing both is good
> too. When/if a bug is closed, new occurrences of similar crashes (the
> same identity string) will lead to creation of a new bug. So if we fix
> only one and close the bug, eventually the second one will lead to a
> new bug (won't be lost), now dedicated to this second crash.
>
> Now syzbot thinks that this bug is fixed/closed:
> https://syzkaller.appspot.com/bug?extid=d3402c47f680ff24b29c
> There is specifically no "undo" functionality, because it's inherently
> racy with creation of a new version of this bugs by new crashes. So if
> of these crashes will happen again, syzbot will open new bugs (now
> with better discriminated titles). We can wait for that. Or we can
> submit new fixes without waiting for new syzbot bugs (adding
> Reported-by to new commits referencing this bug should not do any
> harm).
>
> Hope this clarifies things a bit.
>
> Thanks