[PATCH v3 0/2] mailbox: Fix wrong completion order and improper send result in the blocking mode send API

From: Joonwon Kang

Date: Thu Apr 02 2026 - 13:15:30 EST


Hi team,

This patch series fixes the two major issues in blocking mode.

1) Wrong completion order in the send API as described in [1]:

Thread#1(T1) Thread#2(T2)
mbox_send_message mbox_send_message
| |
V |
add_to_rbuf(M1) V
| add_to_rbuf(M2)
| |
| V
V msg_submit(picks M1)
msg_submit |
| V
V wait_for_completion(on M2)
wait_for_completion(on M1) | (1st in waitQ)
| (2nd in waitQ) V
V wake_up(on completion of M1)<--incorrect

2) Send API does not return the actual send result.

This patch series contains two patches for each issue:
0001-mailbox-Use-per-thread-completion-to-fix-wrong-co.patch
0002-mailbox-Make-mbox_send_message-return-error-code-.patch

The first issue has to do with multi-threads support. Given the
discussion in [1] with the mailbox framework maintainer, it has been
long thought that the mailbox framework is designed to support
multi-threads although it missed the completion order issue at its
introduction. The first patch of this series is to fix it.

Alternatively, we could instead declare that the mailbox API does not
support multi-threads [2]. However, it would be a sudden big change to
the mailbox users after the long standing implication of supporting
multi-threads. Plus, it would have disparity with the non-blocking mode
which supports multi-threads already, which could also lead to confusion
to the users by saying "non-blocking mode supports multi-threads whereas
blocking mode doesn't". For this reason, the first patch in this series
does not choose this alternative.

The patch series rules out the case where tx_tick() is called twice or
more for a sent message on the same channel. In theory, it could happen
when timeout occurs. For example, one tx_tick() by the mailbox core due
to timeout and another tx_tick() by the mailbox controller or client by
accident or for any other reason. If it happens, the internal mailbox
state could become inconsistent even on a single thread. Thus, this
issue should be handled in an orthogonal effort later on.

The second issue forces users to register tx done callback to get the
actual send result although they are using the blocking mode send API.
This behavior is different from typical blocking send APIs, which just
return the actual send result directly, and so confusing to the users.
Without knowing this additional requirement of the API, it would be
prone to miss the send result check entirely. The second patch is to fix
it by making the blocking mode send API return the actual send result.

Change log of the first patch:
- v3: Rebase on the latest for-next branch.
- v2: Consider the case where timeout occurs and so tx_tick() is called
for a channel by one thread while another thread is having an active
request on the same channel. In that case, we mark the inactive
request as canceled and do not send it to the controller.
- v1: The previous solution in v0 tries to have per-message completion:
`tx_cmpl[MBOX_TX_QUEUE_LEN]`; each completion belongs to each slot of
the message queue: `msg_data[i]`. Those completions take up additional
memory even when they are not used. Instead, this patch tries to have
per-"thread" completion; each completion belongs to each sender thread
and each slot of the message queue has a pointer to that completion;
`struct mbox_message` has the "pointer" field
`struct completion *tx_complete` which points to the completion which
is created on the stack of the sender, instead of owning the
completion by `struct completion tx_complete`. This way, we could
avoid additional memory use since a completion will be allocated only
when necessary. Plus, more importantly, we could avoid the window
where the same completion is reused by different sender threads, which
the previous solution still has.
- v0: This first attempt tries to have per-message completion: [1].

Change log of the second patch:
- No major change from v1.

References:
- [1]: https://lore.kernel.org/all/1490809381-28869-1-git-send-email-jaswinder.singh@xxxxxxxxxx
- [2]: https://lore.kernel.org/all/CABb+yY39rhTZbtA21MecYk-R9fh7VQQr5kZUgCw4z92mWhZ1Rg@xxxxxxxxxxxxxx/


Joonwon Kang (2):
mailbox: Use per-thread completion to fix wrong completion order
mailbox: Make mbox_send_message() return error code when tx fails

drivers/mailbox/mailbox.c | 98 ++++++++++++++++++++----------
drivers/mailbox/mtk-vcp-mailbox.c | 2 +-
drivers/mailbox/tegra-hsp.c | 2 +-
include/linux/mailbox_controller.h | 22 +++++--
4 files changed, 85 insertions(+), 39 deletions(-)


Thanks,
Joonwon Kang
--
2.53.0.1185.g05d4b7b318-goog