Hello!
Firstly, I am still relatively new to kernel development, so apologies in advance if my assessment of this issue is incorrect.
I have a Syzkaller crash report for what looks like a use-after free concurrency bug with a
net_device. I am working on getting a consistent/minimal reproducer, but for now this bug seems to be quite difficult to trigger in practice using the attached Syzkaller program.
From the report, it looks like the
net_device is freed at the end of an rtnl critical section in
netdev_run_todo. At the time of the crash, the *use* thread has acquired
rtnl_lock() in
smc_vlan_by_tcpsk. The crash occurred at the line preceded by `
>>>` below in 6.13 rc4 while iterating over devices with
netdev_walk_all_lower_dev:
```
static struct net_device *netdev_next_lower_dev(struct net_device *dev,
struct list_head **iter)
{
struct netdev_adjacent *lower;
>>> lower = list_entry((*iter)->next, struct netdev_adjacent, list);
if (&lower->list == &dev->adj_list.lower)
return NULL;
*iter = &lower->list;
return lower->dev;
}```
This looks to me like it is an issue with reference counting; I see that netdev_refcnt_read is checked in netdev_run_todo before the device is freed, but I don't see anything in netdev_walk_all_lower_dev / netdev_next_lower_dev that is incrementing netdev_refcnt_read when it is iterating over the devices. I'm guessing the fix is to either add reference counting to netdev_walk_all_lower_dev or to use a different, concurrency-safe iterator over the devices in the caller (smc_vlan_by_tcpsk).
Could someone confirm if I am on the right track here? If so I am happy to try to come up with the patch.
Environment:
Qemu (invocation attached) running a Syzkaller image on an Ubuntu 22.04.4 LTS host
Kernel:
tag: 6.13 rc4
compiler toolchain: clang-17
Thanks!
Dylan