Re: [PATCH] RFC: mailbox: Fix NULL message support in mbox_send_message()

From: Jassi Brar

Date: Tue Mar 10 2026 - 20:46:42 EST


On Tue, Mar 10, 2026 at 7:15 PM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> Hi,
>
> On Tue, Mar 10, 2026 at 4:59 PM Jassi Brar <jassisinghbrar@xxxxxxxxx> wrote:
> >
> > On Tue, Mar 10, 2026 at 6:52 PM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, Mar 10, 2026 at 4:46 PM <jassisinghbrar@xxxxxxxxx> wrote:
> > > >
> > > > From: Jassi Brar <jassisinghbrar@xxxxxxxxx>
> > > >
> > > > The active_req field serves double duty as both the "is a TX in
> > > > flight" flag (NULL means idle) and the storage for the in-flight
> > > > message pointer. When a client sends NULL via mbox_send_message(),
> > > > active_req is set to NULL, which the framework misinterprets as
> > > > "no active request." This breaks the TX state machine by:
> > > >
> > > > - tx_tick() short-circuits on (!mssg), skipping the tx_done
> > > > callback and the tx_complete completion
> > > > - txdone_hrtimer() skips the channel entirely since active_req
> > > > is NULL, so poll-based TX-done detection never fires.
> > > >
> > > > Fix this by introducing a MBOX_NO_MSG sentinel value that means
> > > > "no active request," freeing NULL to be valid message data. The
> > > > sentinel is internal to the mailbox core and is never exposed to
> > > > controller drivers or clients.
> > > >
> > > > The only tradeoff is that 'MBOX_NO_MSG' can not be used as a message
> > > > by clients.
> > > >
> > > > Signed-off-by: Jassi Brar <jassisinghbrar@xxxxxxxxx>
> > > > ---
> > > > drivers/mailbox/mailbox.c | 15 +++++++++------
> > > > 1 file changed, 9 insertions(+), 6 deletions(-)
> > >
> > > While I can certainly be corrected, I suspect this patch will break a
> > > lot of users.
> > >
> > > My analysis of people that were passing NULL mbox messages is that
> > > they _wanted_ the current behavior. They didn't want NULL messages to
> > > be queued up as would happen with this patch. Instead, they just
> > > wanted to immediately ring the doorbell again to signal an interrupt
> > > to the other side.
> > >
> > They won't be queued up because they call mbox_client_txdone()
> > immediately after mbox_send_message()
> > From my analysis of the 14 clients, there was just one
> > (irq-qcom-mpm.c) that was not doing it by oversight, and I submitted a
> > patch.
> >
> > Can you please point me to the driver you are concerned about? Perhaps
> > I am overlooking something.
>
> Ah, interesting! I hadn't thought about it that way...
>
> I think there are at least a few others that would need to change,
> maybe? It looks like these ones:
>
> drivers/soc/xilinx/zynqmp_power.c
> drivers/remoteproc/xlnx_r5_remoteproc.c
The underlying driver for both is drivers/mailbox/zynqmp-ipi-mailbox.c
which sets
txdone_poll = true and implements the .last_tx_done() callback.
The client (zynqmp_power.c) should not call mbox_client_txdone()

> drivers/firmware/imx/imx-dsp.c
The underlying driver is drivers/mailbox/imx-mailbox.c which sets
'txdone_irq = true'
and ticks the state machine with mbox_chan_txdone()
Again the client need not call mbox_client_txdone()

All three clients above are currently failed by the core which
misinterprets NULL messages.
This patch will fix such clients too, besides avoiding a new api.

>
> In my case, I have a mailbox driver that's currently downstream
> (though I hope to change that). My mailbox controller has an interrupt
> for txdone, so mailbox clients _shouldn't_ call mbox_client_txdone().
> Some clients of this mailbox client want the txdone interrupt, but
> some clients of it just care about sending doorbells.
>
So imx-dsp.c like? Please let me know what is lacking in the core to
fully support that. Happy to look into it.

Regards,
Jassi