Re: [PATCH v10 0/2] PCI/IOV: Fix SR-IOV locking races and AB-BA deadlock

From: Benjamin Block

Date: Tue Mar 24 2026 - 13:21:12 EST


On Thu, Mar 19, 2026 at 10:27:55PM +0200, Ionut Nechita (Wind River) wrote:
> On Thu, 19 Mar 2026 13:31:39 +0100, Niklas Schnelle wrote:
> > For your awareness, I saw that this series has some findings on
> > Google's new Sashiko AI reviewing tool[0]. At a quick glance the
> > findings seem like at least reasonable concerns to me. I'm still
> > looking at this independently also of course.
>
--8<--
> 3) TOCTOU Race Condition / Lock Window Vulnerability
> — a driver can rebind between device_release_driver() and
> pci_stop_and_remove_bus_device_locked()
>
> This is theoretically valid but practically impossible. The
> window is a few instructions wide. For this race to trigger:
>
> a) device_remove_file_self() has already removed the "remove"
> sysfs attribute, signaling the device is being torn down
> b) a bind_store or udev probe would need to fire in exactly
> that window
> c) the newly bound driver's probe() would need to call
> pci_enable_sriov() and block on pci_rescan_remove_lock
>
> This is the same pattern used elsewhere in the kernel (e.g.,
> the existing remove_store already had no synchronization between
> device_remove_file_self() and pci_stop_and_remove_bus_device_locked()
> — the patch just adds one more call in between).
>
> If this is a real concern, it would need to be addressed as a
> separate improvement, not as a blocker for this fix.

I haven't had time to fully review all this yet, but one quick comment: after
the first idea to unbind the device driver I also realized we could have a
race here between unbinding, and then possibly re-binding. We could probably
prevent that by marking the device as dead:

+ if (val && device_remove_file_self(dev, attr)) {
+ device_lock(dev);
+ kill_device(dev);
+ device_unlock(dev);

This doesn't modify the reference count or anything, but only sets the private
member of the `struct device` `dead` to true. This can't be undone using the
device core's public API, and once set, a device can not be bound to a new
device-driver.

This should prevent any such race AFAICS. The unbind is protected by the
device-mutex, so once the flag is set, and the unbind is done, this device
will stay unbound.

It's not really "pretty" though.

--
Best Regards, Benjamin Block / Linux on IBM Z Kernel Development
IBM Deutschland Research & Development GmbH / https://www.ibm.com/privacy
Vors. Aufs.-R.: Wolfgang Wendt / Geschäftsführung: David Faller
Sitz der Ges.: Ehningen / Registergericht: AmtsG Stuttgart, HRB 243294