Re: [PATCH] RFC: mailbox: Fix NULL message support in mbox_send_message()
From: Jassi Brar
Date: Sat Mar 21 2026 - 12:12:06 EST
On Fri, Mar 20, 2026 at 4:03 PM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> Hi,
>
> On Mon, Mar 16, 2026 at 10:03 PM Joonwon Kang <joonwonkang@xxxxxxxxxx> wrote:
> >
> > > On Fri, Mar 13, 2026 at 5:12 AM Joonwon Kang <joonwonkang@xxxxxxxxxx> wrote:
> > > >
> > > > > The active_req field serves double duty as both the "is a TX in
> > > > > flight" flag (NULL means idle) and the storage for the in-flight
> > > > > message pointer. When a client sends NULL via mbox_send_message(),
> > > > > active_req is set to NULL, which the framework misinterprets as
> > > > > "no active request." This breaks the TX state machine by:
> > > > >
> > > > > - tx_tick() short-circuits on (!mssg), skipping the tx_done
> > > > > callback and the tx_complete completion
> > > > > - txdone_hrtimer() skips the channel entirely since active_req
> > > > > is NULL, so poll-based TX-done detection never fires.
> > > > >
> > > > > Fix this by introducing a MBOX_NO_MSG sentinel value that means
> > > > > "no active request," freeing NULL to be valid message data. The
> > > > > sentinel is internal to the mailbox core and is never exposed to
> > > > > controller drivers or clients.
> > > >
> > > > The following drivers are currently using ->active_req which now could be
> > > > assigned MBOX_NO_MSG.
> > > > - drivers/mailbox/tegra-hsp.c
> > > > - drivers/mailbox/mtk-vcp-mailbox.c
> > > >
> > > Good catch, Thanks.
> > > mtk-vcp-mailbox.c is fine as is - it assumes only valid requests. This
> > > patch will not introduce any change for it.
> > > Yes, tegra-hsp.c should now track MBOX_NO_MSG instead of NULL. Single
> > > line change.
> > >
> > A more important aspect than single line change would be that you are
> > creating a new API contract that the controller drivers who are to use
> > ->active_req should now have new knowledge that the pointer value could
> > be unconventional value such as -1. Without this additional knowledge,
> > they may easily fail checking if the channel is empty.
> >
> > > > One of them is using ->active_req to wait until the channel is empty. In
> > > > this case, strictly speaking, that controller driver should be aware of
> > > > the sentinel value MBOX_NO_MSG, which means the sentinel value should be
> > > > exposed to the controller. Or, if a future controller driver to come is to
> > > > use ->active_req for the same purpose for doorbell or non-doorbell, it
> > > > should also be aware of the sentinel value anyway.
> > > >
> > > > However, I believe that it is not intuitive to the controller developers
> > > > that a pointer value could be other value than a real memory address,
> > > > NULL or error encoded value, which is -1(== MBOX_NO_MSG). For this reason,
> > > > I think it will be better to change the type of ->active_req to give a
> > > > better indication to the controller developers, e.g. to integer as in the
> > > > original patch
> > > > https://lore.kernel.org/all/20251126045926.2413532-1-joonwonkang@xxxxxxxxxx/.
> > > >
> > > > Or, we could change those drivers not to use ->active_req, hide
> > > > ->active_req entirely in the mailbox core and keep this patch.
> > > >
> > > Yes, ideally controller drivers should not track active_req. But this
> > > patch will cause least churn it seems.
> >
> > If we will take those two drivers as an exceptional and not recommended
> > case that uses ->active_req directly, it will be fine not to change them
> > except for tegra-hsp.c to check MBOX_NO_MSG.
> >
> > If that is the case and you are to keep this patch, please keep in mind to
> > leave the original patch link in the commit message for better trackability
> > of the discussion and solutions and for credit for the contribution of
> > finding and analyzing this issue and proposing the first solution, which
> > I believe is also important for future contributions to come.
>
> So what's the next steps here, then? I don't think I adequately
> explained exactly the needs of my mbox client, but I also don't think
> it's a big deal. I'm convinced it should be fine even if NULL messages
> get queued.
>
I think it will be simpler than you imagine, but yes I too don't think
it's a big deal. We can iron out details later.
> Jassi: are you going to send a new version of your patch? ...or are
> you expecting Joonwon to post a new version? ...or do you want me to
> post some variant of these patches?
>
Honestly I only care about minimal churn to code and API. I think
simply using a different sentinel value is simpler than tracking the
number of submissions... which is neither strictly required nor
reduces the changes we have to make.
So I plan to submit my RFC as a patch, the trivial change to
tegra-hsp.c and resubmit the qcom fix with your ack.
Regards,
Jassi