Re: [PATCH v4 2/4] bus: mhi: host: Drop chan lock before queuing buffers

From: Qiang Yu
Date: Thu Dec 07 2023 - 00:27:25 EST



On 12/6/2023 9:48 PM, Manivannan Sadhasivam wrote:
On Wed, Dec 06, 2023 at 10:25:12AM +0800, Qiang Yu wrote:
On 11/30/2023 1:31 PM, Manivannan Sadhasivam wrote:
On Wed, Nov 29, 2023 at 11:29:07AM +0800, Qiang Yu wrote:
On 11/28/2023 9:32 PM, Manivannan Sadhasivam wrote:
On Mon, Nov 27, 2023 at 03:13:55PM +0800, Qiang Yu wrote:
On 11/24/2023 6:04 PM, Manivannan Sadhasivam wrote:
On Tue, Nov 14, 2023 at 01:27:39PM +0800, Qiang Yu wrote:
Ensure read and write locks for the channel are not taken in succession by
dropping the read lock from parse_xfer_event() such that a callback given
to client can potentially queue buffers and acquire the write lock in that
process. Any queueing of buffers should be done without channel read lock
acquired as it can result in multiple locks and a soft lockup.

Is this patch trying to fix an existing issue in client drivers or a potential
issue in the future drivers?

Even if you take care of disabled channels, "mhi_event->lock" acquired during
mhi_mark_stale_events() can cause deadlock, since event lock is already held by
mhi_ev_task().

I'd prefer not to open the window unless this patch is fixing a real issue.

- Mani
In [PATCH v4 1/4] bus: mhi: host: Add spinlock to protect WP access when
queueing
TREs,  we add
write_lock_bh(&mhi_chan->lock)/write_unlock_bh(&mhi_chan->lock)
in mhi_gen_tre, which may be invoked as part of mhi_queue in client xfer
callback,
so we have to use read_unlock_bh(&mhi_chan->lock) here to avoid acquiring
mhi_chan->lock
twice.

Sorry for confusing you. Do you think we need to sqush this two patch into
one?
Well, if patch 1 is introducing a potential deadlock, then we should fix patch
1 itself and not introduce a follow up patch.

But there is one more issue that I pointed out in my previous reply.
Sorry, I can not understand why "mhi_event->lock" acquired during
mhi_mark_stale_events() can cause deadlock. In mhi_ev_task(), we will
not invoke mhi_mark_stale_events(). Can you provide some interpretation?
Going by your theory that if a channel gets disabled while processing the event,
the process trying to disable the channel will try to acquire "mhi_event->lock"
which is already held by the process processing the event.

- Mani
OK, I get you. Thank you for kind explanation. Hopefully I didn't intrude
too much.
Not at all. Btw, did you actually encounter any issue that this patch is trying
to fix? Or just fixing based on code inspection.

- Mani
Yes, we actually meet the race issue in downstream driver. But I can not find more details about the issue.
Also, I'm planning to cleanup the locking mess within MHI in the coming days.
Perhaps we can revisit this series at that point of time. Will that be OK for
you?
Sure, that will be great.
- Mani

Signed-off-by: Qiang Yu <quic_qianyu@xxxxxxxxxxx>
---
drivers/bus/mhi/host/main.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
index 6c6d253..c4215b0 100644
--- a/drivers/bus/mhi/host/main.c
+++ b/drivers/bus/mhi/host/main.c
@@ -642,6 +642,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl,
mhi_del_ring_element(mhi_cntrl, tre_ring);
local_rp = tre_ring->rp;
+ read_unlock_bh(&mhi_chan->lock);
+
/* notify client */
mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result);
@@ -667,6 +669,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl,
kfree(buf_info->cb_buf);
}
}
+
+ read_lock_bh(&mhi_chan->lock);
}
break;
} /* CC_EOT */
--
2.7.4