Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails"

From: Shivnandan Kumar
Date: Tue Oct 11 2022 - 06:05:10 EST



Hi Cristian,

>>Ok, just out of curiosity, once done, can you point me at your downstream public sources so I can see the issue and the fix that you are applying to your trees ?

https://source.codeaurora.org/quic/la/kernel/msm-5.10/tree/drivers/soc/qcom/qcom_rimps.c?h=KERNEL.PLATFORM.1.0.r1-07800-kernel.0

I have added lock while accessing con_priv inside irq handler and shutdown function.


I have one input regarding timeout from firmware, can we enable BUG on response  time out in function do_xfer based on some debug config flag,this will help to debug firmware timeout issue faster.

We will only enable that config flag during internal testing.


Thanks,

Shivnandan

On 10/3/2022 6:52 PM, Cristian Marussi wrote:
On Fri, Sep 30, 2022 at 06:29:02PM +0530, Shivnandan Kumar wrote:
hi Cristian,
Hi Shivnandan,
Thanks for your support in providing the patch to try.

I found one race condition in our downstream mbox controller driver while
accessing con_priv, when I serialized access to this, issue is not seen on 3
days of testing.
Good to hear that you find the issue.

As you rightly mentioned that your provided patch will impact all the other
users.

Also if  we take your provided patch, same race still exists while accessing
con_priv in our downstream mbox controller so this issue will still be
there.

Yes indeed, even though I think that race in the mailbox core between RX path
and chan_free could still be theoretically possible it does not seem to me
appropriate to try to fix it now that you cannot reproduce it anymore and
no other mailbox user has ever raised this concern (even though, as said, the
proper solution to that race wont probably be directly in the mailbox-core as
in my experimental two liners..)

So, we are planning to merge the patch( serialized access to con_priv) in
our downstream mbox controller now.

Ok, just out of curiosity, once done, can you point me at your downstream public
sources so I can see the issue and the fix that you are applying to your trees ?

Thanks,
Cristian