Re: [PATCH v6 07/17] firmware: arm_scmi: Handle concurrent and out-of-order messages

From: Cristian Marussi
Date: Wed Jul 28 2021 - 04:31:33 EST


On Thu, Jul 22, 2021 at 10:32:58AM +0200, Peter Hilber wrote:
> On 19.07.21 11:14, Cristian Marussi wrote:
> > On Thu, Jul 15, 2021 at 06:36:03PM +0200, Peter Hilber wrote:
> > > On 12.07.21 16:18, Cristian Marussi wrote:
>
> [snip]
>
> > > > @@ -608,6 +755,7 @@ static int do_xfer(const struct scmi_protocol_handle *ph,
> > > > xfer->hdr.protocol_id, xfer->hdr.seq,
> > > > xfer->hdr.poll_completion);
> > > > + xfer->state = SCMI_XFER_SENT_OK;
> > >
> > > To be completely safe, this assignment could also be protected by the
> > > xfer->lock.
> > >
> >
> > In fact this would be true being xfer->lock meant to protect the state but it
> > seemed to me unnecessary here given that this is a brand new xfer with a
> > brand new (monotonic) seq number so that any possibly late-received msg will
> > carry an old stale seq number certainly different from this such that cannot be
> > possibly mapped to this same xfer. (but just discarded on xfer lookup in
> > xfer_command_acquire)
> >
> > The issue indeed could still exist only for do_xfer loops (as you pointed out
> > already early on) where the seq_num is used, but in that case on a timeout we
> > would have already bailed out of the loop and reported an error so any timed-out
> > late received response would have been anyway discarded; so at the end I thought
> > I could avoid spinlocking here.
> >
> > Thanks,
> > Cristian
> >

Hi Peter,

sorry for the late answer.

>
> I mostly meant to refer to the possibility of a very fast response not
> seeing this assignment, since the next line is
>
> > ret = info->desc->ops->send_message(cinfo, xfer);
>
> and during that a regular scmi_rx_callback(), reading xfer->state, can
> already arrive. But maybe this is too theoretical.
>

Right, that's a possibility indeed to account for even if remote: given
that, though, no race is possible here on state as said, I'd still avoid the
spinlock and related irq-off and opt instead for a barrier to avoid
re-ordering and to be sure that the scmi_rx_callback() on the RX processor
can see the latest value (a dmb(ish) + cache coherence magic should be enough)

Thanks,
Cristian