Re: [BUG] bus: mhi: possible ABBA deadlock in mhi_pm_m0_transition() and mhi_send_cmd()

From: Jia-Ju Bai
Date: Sun Jul 18 2021 - 22:09:03 EST




On 2021/7/16 14:45, Manivannan Sadhasivam wrote:
On Thu, Jul 15, 2021 at 09:57:22AM -0700, Bhaumik Bhatt wrote:
On 2021-07-15 02:45 AM, Jia-Ju Bai wrote:
Hello,

I find there is a possible ABBA deadlock in the MHI driver in Linux
5.10:

In mhi_pm_m0_transition():
262:     read_lock_bh(&mhi_cntrl->pm_lock);
281:     spin_lock_irq(&mhi_cmd->lock);

In mhi_send_cmd():
1181:   spin_lock_bh(&mhi_cmd->lock);
1207:   read_lock_bh(&mhi_cntrl->pm_lock);

When mhi_pm_m0_transition() and mhi_send_cmd() are concurrently
executed, the deadlock can occur.

I check the code and find a possible case of such concurrent execution:

#CPU1:
mhi_poll (mhi_event->process_event(...))
  mhi_process_ctrl_ev_ring
    mhi_pm_m0_transition

#CPU2:
mhi_prepare_for_transfer
  mhi_prepare_channel
    mhi_send_cmd

Note that mhi_poll() and mhi_prepare_for_transfer() are both exported
by EXPORT_SYMBOL.
Thus, I guess these two functions could be concurrently called by a MHI
driver.

I am not quite sure whether this possible deadlock is real and how to
fix it if it is real.
Any feedback would be appreciated, thanks :)


Best wishes,
Jia-Ju Bai
Few pointers from your example:

1. mhi_poll() is currently not used by any client upstream yet.
Then this shouldn't be added in first place... :/

2. Polling is not to be used for single event ring (shared control + data)
cases
since it is meant to be for client drivers with dedicated data packets only.
3. mhi_send_cmd() will always be called after an mhi_pm_m0_transition() has
completed by design since we wait for the device to be held in M0 prior to
it.

But client can be unloaded during M0 event!

Anyway, I don't think the deadlock scenario is valid because of the usage
of "read_lock_bh()". So if "mhi_send_cmd()" has acquired
"spin_lock_bh(&mhi_cmd->lock)", it can always acquire
"read_lock_bh(&mhi_cntrl->pm_lock)" as multiple readers can acquire the
read lock.

Deadlock would only occur if one of the functions take write lock.

Thanks for auditing.

Thanks for the feedback and explanation :)


Best wishes,
Jia-Ju Bai