Re: [syzbot] [wireguard?] KASAN: slab-use-after-free Write in enqueue_timer

From: Jason A. Donenfeld
Date: Tue May 23 2023 - 11:46:38 EST


Hey Syzkaller & Netdev folks,

I've been looking at this a bit and am slightly puzzled. At first I saw
this:

> enqueue_timer+0xad/0x560 kernel/time/timer.c:605
> internal_add_timer kernel/time/timer.c:634 [inline]
> __mod_timer+0xa76/0xf40 kernel/time/timer.c:1131
> mod_peer_timer+0x158/0x220 drivers/net/wireguard/timers.c:37
> wg_packet_consume_data_done drivers/net/wireguard/receive.c:354 [inline]
> wg_packet_rx_poll+0xd9e/0x2250 drivers/net/wireguard/receive.c:474

And I thought - darn, it's a bug where a struct wg_peer's timer is
modified -- in this case, timer_persistent_keepalive by way of
wg_timers_any_authenticated_packet_traversal() -- after the peer object
has been freed. This fits most clearly the designated line
receive.c:354, and the subsequent 8 byte write when enqueuing the timer.

So I traced through the peer shutdown code in peer.c -- the
peer_make_dead() + peer_remove_after_dead() combo -- and made sure the
peer->is_dead RCU logic was correct. And I couldn't find a bug.

But then I looked further down at the syzbot report:

> Allocated by task 16792:
> kvzalloc include/linux/slab.h:705 [inline]
> alloc_netdev_mqs+0x89/0xf30 net/core/dev.c:10626
> rtnl_create_link+0x2f7/0xc00 net/core/rtnetlink.c:3315

and

> Freed by task 41:
> __kmem_cache_free+0x264/0x3c0 mm/slub.c:3799
> device_release+0x95/0x1c0
> kobject_cleanup lib/kobject.c:683 [inline]
> kobject_release lib/kobject.c:714 [inline]
> kref_put include/linux/kref.h:65 [inline]
> kobject_put+0x228/0x470 lib/kobject.c:731
> netdev_run_todo+0xe5a/0xf50 net/core/dev.c:10400

So that means the memory in question is actually the one that's
allocated and freed by the networking stack. Specifically, dev.c:10626
is allocating a struct net_device with a trailing struct wg_device (its
priv_data). However, wg_device does not have any struct timer_lists in
it, and I don't see how net_device's watchdog_timer would be related to
the stacktrace which is clearly operating over a wg_peer timer.

So what on earth is going on here?

Jason

PS - Jakub, I have some WG fixes queued up for you, but I wanted to have
some resolution with this first before sending a tranche.