Re: [PATCH] scsi: ufs: Fix a possible dead lock in clock scaling

From: Bart Van Assche
Date: Wed Sep 29 2021 - 14:15:25 EST


On 9/28/21 8:31 PM, Can Guo wrote:
On 2021-09-18 01:27, Bart Van Assche wrote:
On 9/16/21 6:51 PM, Can Guo wrote:
Assume a scenario where task A and B call ufshcd_devfreq_scale()
simultaneously. After task B calls downgrade_write() [1], but before it
calls down_read() [3], if task A calls down_write() [2], when task B calls
down_read() [3], it will lead to dead lock.

Something is wrong with the above description. The downgrade_write() call is
not followed by down_read() but by up_read(). Additionally, I don't see how
concurrent calls of ufshcd_devfreq_scale() could lead to a deadlock.

As mentioned in the commit msg, the down_read() [3] is from ufshcd_wb_ctrl().

Task A -
down_write [2]
ufshcd_clock_scaling_prepare
ufshcd_devfreq_scale
ufshcd_clkscale_enable_store

Task B -
down_read [3]
ufshcd_exec_dev_cmd
ufshcd_query_flag
ufshcd_wb_ctrl
downgrade_write [1]
ufshcd_devfreq_scale
ufshcd_devfreq_target
devfreq_set_target
update_devfreq
devfreq_performance_handler
governor_store


If one thread calls downgrade_write() and another thread calls down_write()
immediately, that down_write() call will block until the other thread has called up_read()
without triggering a deadlock.

Since the down_write() caller is blocked, the down_read() caller, which comes after
down_write(), is blocked too, no? downgrade_write() keeps lock owner as it is, but
it does not change the fact that readers and writers can be blocked by each other.

Please use the upstream function names when posting upstream patches. I think that
ufshcd_wb_ctrl() has been renamed into ufshcd_wb_toggle().

So the deadlock is caused by nested locking - one task holding a reader lock, another
task calling down_write() and next the first task grabbing the reader lock recursively?
I prefer one of the following two solutions above the patch that has been posted since
I expect that both alternatives will result in easier to maintain UFS code:
- Fix the down_read() implementation. Making down_read() wait in case of nested locking
seems wrong to me.
- Modify the UFS driver such that it does not lock hba->clk_scaling_lock recursively.

Thanks,

Bart.