Re: [PATCH] driver core: Fix concurrency issue in match driver interface.

From: Qiu-ji Chen
Date: Tue Nov 12 2024 - 15:39:20 EST


Hi, Greg

The driver_override is updated by driver_set_override(), a function
provided by the driver core. In driver_set_override(), the
driver_override is updated while holding the device_lock. Therefore,
locking is required when calling match to prevent the driver_override
from being modified within driver_set_override().

It’s fine to add locks in these two functions. The second half of
__driver_attach already performs lock and unlock operations, and the
device_driver_attach function in bind_store also has lock and unlock
operations. So, when calling the bind_store function and
__driver_attach, these are environments without locks, where adding
locks is appropriate.

In the current core code, among the three calls to
driver_match_device, two are not locked and one is locked. Therefore,
adding locks in the lower-level driver would conflict with the already
locked path in the upper-level function, causing a potential deadlock.
Thus, locking cannot be added in the lower-level driver. In the call
chain from __device_attach to __device_attach_driver to
driver_match_device, the call to __device_attach_driver is already
protected by a lock, so the call to driver_match_device in this chain
is locked. If we add locks in the lower-level driver, it could lead to
a deadlock due to the upper-level lock.

Changing the string checking process to a driver core function doesn't
help, because this checking function needs to hold the device_lock to
prevent updates by the driver_set_override() function, and it would
still be called in the implementation of the match interface.
Essentially, it would still involve adding a lock at the lower level.
As mentioned earlier, there is already a locked path for the match
interface at the upper level, which could lead to a deadlock. From our
perspective, it seems impossible to modify this locked path at the
upper level to be lock-free, so we did not choose the solution of
adding a lock at the lower-level driver.

Therefore, we recommend adding locks to all three calls to
driver_match_device in the upper-level code to ensure consistency and
prevent modifications to the driver_override field during the
driver_set_override function call, fixing the data race issue. Based
on your feedback, we have updated the description in v2. Thank you for
your discussion.

Regards,
Qiu-ji Chen