Re: [PATCH] RFC: mailbox: Fix NULL message support in mbox_send_message()

From: Joonwon Kang

Date: Thu Mar 26 2026 - 03:37:23 EST


> On Fri, Mar 20, 2026 at 4:03 PM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > On Mon, Mar 16, 2026 at 10:03 PM Joonwon Kang <joonwonkang@xxxxxxxxxx> wrote:
> > >
> > > > On Fri, Mar 13, 2026 at 5:12 AM Joonwon Kang <joonwonkang@xxxxxxxxxx> wrote:
> > > > >
> > > > > > The active_req field serves double duty as both the "is a TX in
> > > > > > flight" flag (NULL means idle) and the storage for the in-flight
> > > > > > message pointer. When a client sends NULL via mbox_send_message(),
> > > > > > active_req is set to NULL, which the framework misinterprets as
> > > > > > "no active request." This breaks the TX state machine by:
> > > > > >
> > > > > > - tx_tick() short-circuits on (!mssg), skipping the tx_done
> > > > > > callback and the tx_complete completion
> > > > > > - txdone_hrtimer() skips the channel entirely since active_req
> > > > > > is NULL, so poll-based TX-done detection never fires.
> > > > > >
> > > > > > Fix this by introducing a MBOX_NO_MSG sentinel value that means
> > > > > > "no active request," freeing NULL to be valid message data. The
> > > > > > sentinel is internal to the mailbox core and is never exposed to
> > > > > > controller drivers or clients.
> > > > >
> > > > > The following drivers are currently using ->active_req which now could be
> > > > > assigned MBOX_NO_MSG.
> > > > > - drivers/mailbox/tegra-hsp.c
> > > > > - drivers/mailbox/mtk-vcp-mailbox.c
> > > > >
> > > > Good catch, Thanks.
> > > > mtk-vcp-mailbox.c is fine as is - it assumes only valid requests. This
> > > > patch will not introduce any change for it.
> > > > Yes, tegra-hsp.c should now track MBOX_NO_MSG instead of NULL. Single
> > > > line change.
> > > >
> > > A more important aspect than single line change would be that you are
> > > creating a new API contract that the controller drivers who are to use
> > > ->active_req should now have new knowledge that the pointer value could
> > > be unconventional value such as -1. Without this additional knowledge,
> > > they may easily fail checking if the channel is empty.
> > >
> > > > > One of them is using ->active_req to wait until the channel is empty. In
> > > > > this case, strictly speaking, that controller driver should be aware of
> > > > > the sentinel value MBOX_NO_MSG, which means the sentinel value should be
> > > > > exposed to the controller. Or, if a future controller driver to come is to
> > > > > use ->active_req for the same purpose for doorbell or non-doorbell, it
> > > > > should also be aware of the sentinel value anyway.
> > > > >
> > > > > However, I believe that it is not intuitive to the controller developers
> > > > > that a pointer value could be other value than a real memory address,
> > > > > NULL or error encoded value, which is -1(== MBOX_NO_MSG). For this reason,
> > > > > I think it will be better to change the type of ->active_req to give a
> > > > > better indication to the controller developers, e.g. to integer as in the
> > > > > original patch
> > > > > https://lore.kernel.org/all/20251126045926.2413532-1-joonwonkang@xxxxxxxxxx/.
> > > > >
> > > > > Or, we could change those drivers not to use ->active_req, hide
> > > > > ->active_req entirely in the mailbox core and keep this patch.
> > > > >
> > > > Yes, ideally controller drivers should not track active_req. But this
> > > > patch will cause least churn it seems.
> > >
> > > If we will take those two drivers as an exceptional and not recommended
> > > case that uses ->active_req directly, it will be fine not to change them
> > > except for tegra-hsp.c to check MBOX_NO_MSG.
> > >
> > > If that is the case and you are to keep this patch, please keep in mind to
> > > leave the original patch link in the commit message for better trackability
> > > of the discussion and solutions and for credit for the contribution of
> > > finding and analyzing this issue and proposing the first solution, which
> > > I believe is also important for future contributions to come.
> >
> > So what's the next steps here, then? I don't think I adequately
> > explained exactly the needs of my mbox client, but I also don't think
> > it's a big deal. I'm convinced it should be fine even if NULL messages
> > get queued.
> >
> I think it will be simpler than you imagine, but yes I too don't think
> it's a big deal. We can iron out details later.
>
> > Jassi: are you going to send a new version of your patch? ...or are
> > you expecting Joonwon to post a new version? ...or do you want me to
> > post some variant of these patches?
> >
> Honestly I only care about minimal churn to code and API. I think
> simply using a different sentinel value is simpler than tracking the
> number of submissions... which is neither strictly required nor
> reduces the changes we have to make.
> So I plan to submit my RFC as a patch, the trivial change to
> tegra-hsp.c and resubmit the qcom fix with your ack.
>
> Regards,
> Jassi

I have reviewed the new patch.

Regarding the issue I brought up on how this alternative patch had been
uploaded without letting the original patch author know and without any tag for
better tracking or credit, I hope next time you could use "Reported-by:" or
"Link:" tag at least to encourage contributors to keep spending effort finding
and analyzing issues in the mailbox framework. I believe it could be better for
community and quality of code as also guided in the submitting-patches doc.

Sincerely,
Joonwon Kang