Re: [PATCH] RFC: mailbox: Fix NULL message support in mbox_send_message()

From: Jassi Brar

Date: Tue Mar 10 2026 - 23:42:07 EST


On Tue, Mar 10, 2026 at 9:00 PM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> > > Ah, interesting! I hadn't thought about it that way...
> > >
> > > I think there are at least a few others that would need to change,
> > > maybe? It looks like these ones:
> > >
> > > drivers/soc/xilinx/zynqmp_power.c
> > > drivers/remoteproc/xlnx_r5_remoteproc.c
> > The underlying driver for both is drivers/mailbox/zynqmp-ipi-mailbox.c
> > which sets
> > txdone_poll = true and implements the .last_tx_done() callback.
> > The client (zynqmp_power.c) should not call mbox_client_txdone()
> >
> > > drivers/firmware/imx/imx-dsp.c
> > The underlying driver is drivers/mailbox/imx-mailbox.c which sets
> > 'txdone_irq = true'
> > and ticks the state machine with mbox_chan_txdone()
> > Again the client need not call mbox_client_txdone()
> >
> > All three clients above are currently failed by the core which
> > misinterprets NULL messages.
> > This patch will fix such clients too, besides avoiding a new api.
>
> ...but doesn't that mean that the behavior of these clients will
> change? Previously all calls to mbox_send_message() immediately called
> through to the mailbox controller. Now, if the previous message hasn't
> finished, the "NULL" messages will queue up.
>
I don't see any problem. As we have checked, all clients call
mbox_client_txdone() after each mbox_send_message() because the
underlying controller just sets the bit and the doorbell is rung. The
clients correctly drive the state-machine. So they are already doing
the right thing and should be fine.

> For instance, let's say that our client calls mbox_send_message() 2
> times in quick succession (10 us apart). Let's say that the remote
> processor takes 1 ms to react.
>
> Previously, the 2 calls to mbox_send_message() would probably be
> coalesced. Each would call through to the mailbox controller, which
> would make sure the doorbell was asserted. After 1 ms, the remote
> processor would confirm the single doorbell it saw.
>
> Now, the first call will ring the doorbell. The second one will queue
> up. After 1 ms, the remote processor will confirm the doorbell, which
> will cause the second doorbell to be sent.
>
Am I overlooking some code? I found no existing client relying on that
bug, luckily. Even if some did, it would be better to fix that client
instead of moving the subsystem around for it.

>
> Looking at the 3 drivers in question, I _guess_ maybe that situation
> never comes up for them? So maybe they're all fine? I don't have tons
> of confidence that I understand enough about these clients to say for
> sure that this doesn't happen...
>
Those 3 clients don't run the state machine (tick), the underlying
controllers do.


>
> > > In my case, I have a mailbox driver that's currently downstream
> > > (though I hope to change that). My mailbox controller has an interrupt
> > > for txdone, so mailbox clients _shouldn't_ call mbox_client_txdone().
> > > Some clients of this mailbox client want the txdone interrupt, but
> > > some clients of it just care about sending doorbells.
> > >
> > So imx-dsp.c like? Please let me know what is lacking in the core to
> > fully support that. Happy to look into it.
>
> I know that with my downstream mailbox client, if I let NULL messages
> queue up I end up with a queue of a dozen or so NULL messages. The
> downstream client is really taking advantage (AKA abusing) the core's
> current NULL behavior. It truly does want the "mbox_ring_doorbell"
> concept of just making sure the doorbell is asserted and returning
> immediately. If the core changes to start queuing NULL messages, we'll
> have to figure out some sort of workaround...
>
You said your controller has an irq raised for the doorbell sent (?).
If so, your clients should want to wait for that confirmation (the
controller driver would tick via mbox_chan_txdone).
If there is no irq involved and your controller can just set some bit
and the doorbell is guaranteed, then you just need to tie
mbox_send_message() & mbox_client_txdone() , which is the right way
and how other such clients work.
In any case, I will be happy to look at your draft code and make sure
your usecase is supported.

Regards,
Jassi