Re: [PATCH] RFC: mailbox: Fix NULL message support in mbox_send_message()
From: Doug Anderson
Date: Tue Mar 10 2026 - 20:15:38 EST
Hi,
On Tue, Mar 10, 2026 at 4:59 PM Jassi Brar <jassisinghbrar@xxxxxxxxx> wrote:
>
> On Tue, Mar 10, 2026 at 6:52 PM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > On Tue, Mar 10, 2026 at 4:46 PM <jassisinghbrar@xxxxxxxxx> wrote:
> > >
> > > From: Jassi Brar <jassisinghbrar@xxxxxxxxx>
> > >
> > > The active_req field serves double duty as both the "is a TX in
> > > flight" flag (NULL means idle) and the storage for the in-flight
> > > message pointer. When a client sends NULL via mbox_send_message(),
> > > active_req is set to NULL, which the framework misinterprets as
> > > "no active request." This breaks the TX state machine by:
> > >
> > > - tx_tick() short-circuits on (!mssg), skipping the tx_done
> > > callback and the tx_complete completion
> > > - txdone_hrtimer() skips the channel entirely since active_req
> > > is NULL, so poll-based TX-done detection never fires.
> > >
> > > Fix this by introducing a MBOX_NO_MSG sentinel value that means
> > > "no active request," freeing NULL to be valid message data. The
> > > sentinel is internal to the mailbox core and is never exposed to
> > > controller drivers or clients.
> > >
> > > The only tradeoff is that 'MBOX_NO_MSG' can not be used as a message
> > > by clients.
> > >
> > > Signed-off-by: Jassi Brar <jassisinghbrar@xxxxxxxxx>
> > > ---
> > > drivers/mailbox/mailbox.c | 15 +++++++++------
> > > 1 file changed, 9 insertions(+), 6 deletions(-)
> >
> > While I can certainly be corrected, I suspect this patch will break a
> > lot of users.
> >
> > My analysis of people that were passing NULL mbox messages is that
> > they _wanted_ the current behavior. They didn't want NULL messages to
> > be queued up as would happen with this patch. Instead, they just
> > wanted to immediately ring the doorbell again to signal an interrupt
> > to the other side.
> >
> They won't be queued up because they call mbox_client_txdone()
> immediately after mbox_send_message()
> From my analysis of the 14 clients, there was just one
> (irq-qcom-mpm.c) that was not doing it by oversight, and I submitted a
> patch.
>
> Can you please point me to the driver you are concerned about? Perhaps
> I am overlooking something.
Ah, interesting! I hadn't thought about it that way...
I think there are at least a few others that would need to change,
maybe? It looks like these ones:
drivers/soc/xilinx/zynqmp_power.c
drivers/remoteproc/xlnx_r5_remoteproc.c
drivers/firmware/imx/imx-dsp.c
In my case, I have a mailbox driver that's currently downstream
(though I hope to change that). My mailbox controller has an interrupt
for txdone, so mailbox clients _shouldn't_ call mbox_client_txdone().
Some clients of this mailbox client want the txdone interrupt, but
some clients of it just care about sending doorbells.
This driver would be able to send "txdone" for doorbells (or it could
have a "doorbell done"), but no clients that send doorbells actually
care about this signal. That being said, I believe clients do care
about doorbells not being queued.
-Doug