Re: [syzbot] [wireguard?] KASAN: slab-use-after-free Write in enqueue_timer

From: Dmitry Vyukov
Date: Wed May 24 2023 - 04:24:58 EST


On Tue, 23 May 2023 at 19:07, 'Eric Dumazet' via syzkaller-bugs
<syzkaller-bugs@xxxxxxxxxxxxxxxx> wrote:
>
> On Tue, May 23, 2023 at 7:05 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> >
> > On Tue, May 23, 2023 at 7:01 PM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote:
> > >
> > > On Tue, May 23, 2023 at 09:47:36AM -0700, Jakub Kicinski wrote:
> > > > On Tue, 23 May 2023 18:42:53 +0200 Jason A. Donenfeld wrote:
> > > > > > It should, no idea why it isn't. Looking thru the code now I don't see
> > > > > > any obvious gaps where timer object is on a list but not active :S
> > > > > > There's no way to get a vmcore from syzbot, right? :)
> > > > > >
> > > > > > Also I thought the shutdown leads to a warning when someone tries to
> > > > > > schedule the dead timer but in fact add_timer() just exits cleanly.
> > > > > > So the shutdown won't help us find the culprit :(
> > > > >
> > > > > Worth noting that it could also be caused by adding to a dead timer
> > > > > anywhere in priv_data of another netdev, not just the sole timer_list
> > > > > in net_device.
> > > >
> > > > Oh, I thought you zero'ed in on the watchdog based on offsets.
> > > > Still, object debug should track all timers in the slab and complain
> > > > on the free path.
> > >
> > > No, I mentioned watchdog because it's the only timer_list in struct
> > > net_device.
> > >
> > > Offset analysis is an interesting idea though. Look at this:
> > >
> > > > The buggy address belongs to the object at ffff88801ecc0000
> > > > which belongs to the cache kmalloc-cg-8k of size 8192
> > > > The buggy address is located 5376 bytes inside of
> > > > freed 8192-byte region [ffff88801ecc0000, ffff88801ecc2000)
> > >
> > > IDA says that for syzkaller's vmlinux, net_device has a size of 0xc80
> > > and wg_device has a size of 0x880. 0xc80+0x880=5376. Coincidence that
> > > the address offset is just after what wg uses?
> >
> >
> > Note that the syzkaller report mentioned:
> >
> > alloc_netdev_mqs+0x89/0xf30 net/core/dev.c:10626
> > usbnet_probe+0x196/0x2770 drivers/net/usb/usbnet.c:1698
> > usb_probe_interface+0x5c4/0xb00 drivers/usb/core/driver.c:396
> > really_probe+0x294/0xc30 drivers/base/dd.c:658
> > __driver_probe_device+0x1a2/0x3d0 drivers/base/dd.c:800
> > driver_probe_device+0x50/0x420 drivers/base/dd.c:830
> > __device_attach_driver+0x2d3/0x520 drivers/base/dd.c:958
> >
> > So maybe an usbnet driver has a timer_list in its priv_data.
>
> struct usbnet {
> ...
> struct timer_list delay;

FWIW There are more report examples on the dashboard.
There are some that don't mention wireguard nor usbnet, e.g.:
https://syzkaller.appspot.com/text?tag=CrashReport&x=17dd2446280000
So that's probably red herring. But they all seem to mention alloc_netdev_mqs.
Let's do for now:
#syz set subsystems: net