Re: [PATCH v1 0/7] MPFS system controller/mailbox fixes

From: Conor Dooley
Date: Thu Feb 16 2023 - 17:24:55 EST


Hey Jassi,

On Sat, Jan 21, 2023 at 07:12:52PM +0000, Conor Dooley wrote:
> On Sat, Jan 21, 2023 at 10:01:41AM -0600, Jassi Brar wrote:
> > On Wed, Jan 11, 2023 at 7:45 AM Conor Dooley <conor.dooley@xxxxxxxxxxxxx> wrote:
> > >
> > > In order to differentiate between the service succeeding & the system
> > > controller being inoperative or otherwise unable to function, I had to
> > > switch the controller to poll a busy bit in the system controller's
> > > registers to see if it has completed a service.
> > > This makes sense anyway, as the interrupt corresponds to "data ready"
> > > rather than "tx done", so I have changed the mailbox controller driver
> > > to do that & left the interrupt solely for signalling data ready.
> > > It just so happened that all of the services that I had worked with and
> > > tested up to this point were "infallible" & did not set a status, so the
> > > particular code paths were never tested.
> > >
> > > Jassi, the mailbox and soc patches depend on each other, as the change
> > > in what the interrupt is used for requires changing the client driver's
> > > behaviour too, as mbox_send_message() will now return when the system
> > > controller is no longer busy rather than when the data is ready.
> > > I'm happy to send the lot via the soc tree with your Ack and/or reivew,
> > > if that also works you?
> > >
> > Ok, let me review them and get back to you.
>
> FYI, I did sent a v2 on Friday:
> https://lore.kernel.org/all/20230120143734.3438755-1-conor.dooley@xxxxxxxxxxxxx/
>
> The change is just a timeout duration though.
>
> > > Secondly, I have a question about what to do if a service does fail, but
> > > not due to a timeout - eg the above example where the "new" image for
> > > the FPGA is actually older than the one that currently exists.
> > > Ideally, if a service fails due to something other than the transaction
> > > timing out, I would go and read the status registers to see what the
> > > cause of failure was.
> > > I could not find a function in the mailbox framework that allows the
> > > client to request that sort of information from the client. Trying to
> > > do something with the auxiliary bus, or exporting some function to a
> > > device specific header seemed like a circumvention of the mailbox
> > > framework.
> > > Do you think it would be a good idea to implement something like
> > > mbox_client_peek_status(struct mbox_chan *chan, void *data) to allow
> > > clients to request this type of information?
> > >
> > .last_tx_done() is supposed to make sure everything is ok.
>
> Hm, might've explained badly as I think you've misunderstood. Or (see
> below) I might have mistakenly thought that last_tx_done() was only meant
> to signify that tx was done.
>
> Anyways, I'll try to clarify.
> Some services don't set a status, but whether a status is, or isn't,
> set has nothing to do with whether the service has completed.
> One service that sets a status is "Authenticate Bitstream". This
> service sets a status of 0x0 if the bitstream in question is okay _and_
> something that the FPGA can be upgraded to. It returns a failure of 0x18
> if the bitstream is valid _but_ is the same as that currently programmed.
> (and of course a whole host of other possible errors in-between)
>
> These statuses, and whether they are a bad outcome or not, is dependant
> on the service and I don't think should be handled in the mailbox
> controller driver.
>
> > If the expected status bit is "sometimes not set", that means that bit
> > is not the complete status.
>
> If the "busy" bit goes low, then the transmission must be complete,
> there should be no need to check other bits for *completion*, but...
>
> > You have to check multiple registers to
> > detect if and what caused the failure.
>
> ...maybe I have just misunderstood the role of .last_tx_done(). The
> comment in mailbox-controller.h lead me to believe that it was used just
> to check if it had been completed.
>
> Am I allowed to use .last_tx_done() to pass information back to the
> mailbox client? If I could, that'd certainly be a nice way to get the
> information on whether the service failed etc.
>
> Hopefully that, plus when you have a chance to look at the code, will
> make what I am asking about a little clearer!

Just wondering if you've had a chance to look at this again! I know it's
missed the merge window this time around but I would like to get this
behaviour fixed as other work depends on it.

Thanks,
Conor.

Attachment: signature.asc
Description: PGP signature