Possible hungtask issue will be introduced with device_lock() in uevent_show()
From: Zhang Zekun
Date: Tue Dec 31 2024 - 03:03:29 EST
Hi, Dan, Greg,
We have found a potential tungtask issue has been introduce by commit 9a71892cbcdb ("Revert "driver core: Fix uevent_show() vs driver detach race""), which revert the rcu in device_uevent but reintroduce the device_lock() in uevent_show(). The reproduce procedure is quite simple:
$ lspci -n -s 0000:00:03.0
00:03.0 0108: 8086:5845 (rev 02)
$ echo 8086 5845 > /sys/bus/pci/drivers/vfio-pci/new_id
$ ls /dev/vfio
3 vfio
$ ./a.out & (Binary program compiled with the following vfio demo program)
$ echo 1 > /sys/devices/pci0000:00/0000:00:03.0/remove
The root cause of the hung task is due to the device_lock() in uevent_show() is blocked by device_realease_driver() in pci remove routeine which is also blocked to wait for vfio_device->refcount == 0, which can never come because it's refcount has been lifted in the same user process.
Besides, the patch proposed by Dan [1] can not prevent hungtask from happening. For 6.13-rc5, in the following case:
ioctl(..,VFIO_GROUP_GET_DEVICE_FD,..)
...
vfio_device_get_from_name()
vfio_devcie->refcount -> 2
pci_stop_and_remove_bus_device()
pci_remove_bus()
device_unregister()
...
device_release_driver()
device_lock()
device_remove()
vfio_unregister_group_dev()
vfio_devcie_put()
vfio_devcie->refcount -> 1
wait_for(vfio_devcie->refcount = 0)
uevent_show()
device_lock()
[1] https://lore.kernel.org/all/172790598832.1168608.4519484276671503678.stgit@xxxxxxxxxxxxxxxxxxxxxxxxx/#R
-------------------->8---------------------------------
int main(int argc, char *argv[])
{
char buf[128];
int container, group, device, uevent, ret;
struct vfio_group_status group_status =
{ .argsz = sizeof(group_status) };
struct vfio_iommu_type1_info iommu_info = { .argsz = sizeof(iommu_info) };
container = open("/dev/vfio/vfio", O_RDWR);
group = open("/dev/vfio/3", O_RDWR);
ioctl(group, VFIO_GROUP_GET_STATUS, &group_status);
ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
ioctl(container, VFIO_IOMMU_GET_INFO, &iommu_info);
device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:00:03.0");
uevent = open("/sys/devices/pci0000:00/0000:00:03.0/uevent", O_RDWR);
sleep(10); /* Remove the pci device here */
ret = read(uevent, buf, 128); /* We will get hung task here */
printf("ret %d\n", ret);
return 0;
}
Thanks,
Zekun