Concurrent slab-use-after-free in netdev_next_lower_dev

From: Dylan Wolff
Date: Mon Mar 31 2025 - 09:02:34 EST


Hello!

Firstly, I am still relatively new to kernel development, so apologies in advance if my assessment of this issue is incorrect.

I have a Syzkaller crash report for what looks like a use-after free concurrency bug with a net_device. I am working on getting a consistent/minimal reproducer, but for now this bug seems to be quite difficult to trigger in practice using the attached Syzkaller program.

From the report, it looks like the net_device is freed at the end of an rtnl critical section in netdev_run_todo. At the time of the crash, the *use* thread has acquired rtnl_lock() in smc_vlan_by_tcpsk. The crash occurred at the line preceded by `>>>` below in 6.13 rc4 while iterating over devices with netdev_walk_all_lower_dev:

```
static struct net_device *netdev_next_lower_dev(struct net_device *dev,
struct list_head **iter)
{
struct netdev_adjacent *lower;

>>> lower = list_entry((*iter)->next, struct netdev_adjacent, list);

if (&lower->list == &dev->adj_list.lower)
return NULL;

*iter = &lower->list;

return lower->dev;
}
```

This looks to me like it is an issue with reference counting; I see that netdev_refcnt_read is checked in netdev_run_todo before the device is freed, but I don't see anything in netdev_walk_all_lower_devnetdev_next_lower_dev that is incrementing netdev_refcnt_read when it is iterating over the devices. I'm guessing the fix is to either add reference counting to netdev_walk_all_lower_dev or to use a different, concurrency-safe iterator over the devices in the caller (smc_vlan_by_tcpsk).

Could someone confirm if I am on the right track here? If so I am happy to try to come up with the patch.

Environment:
     Qemu (invocation attached) running a Syzkaller image on an Ubuntu 22.04.4 LTS host
Kernel:
     tag: 6.13 rc4
     compiler toolchain: clang-17

Thanks!
Dylan

sudo qemu-system-x86_64 \
-m 8G \
-smp 6 \
-kernel $LINUX_DIR/arch/x86/boot/bzImage \
-append "console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \
-drive file=$IMAGE_DIR/bookworm.img,format=raw \
-net user,host=10.0.2.10,hostfwd=tcp:127.0.0.1:$((10021+$i))-:22 \
-net nic,model=e1000 \
-snapshot \
-enable-kvm \
-nographic \
-pidfile vm$i.pid \
2>&1 | tee vm$i.log

Attachment: 2e50cc6b5eed2cdd8e652711c64739a9a120a405.zip
Description: Zip archive