Re: [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails

From: Joonwon Kang

Date: Sat Apr 04 2026 - 07:48:07 EST

> On Fri, Apr 3, 2026 at 10:19 AM Joonwon Kang <joonwonkang@xxxxxxxxxx> wrote:
> >
> > > On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@xxxxxxxxxx> wrote:
> > > >
> > > > When the mailbox controller failed transmitting message, the error code
> > > > was only passed to the client's tx done handler and not to
> > > > mbox_send_message(). For this reason, the function could return a false
> > > > success. This commit resolves the issue by introducing the tx status and
> > > > checking it before mbox_send_message() returns.
> > > >
> > > Can you please share the scenario when this becomes necessary? This
> > > can potentially change the ground underneath some clients, so we have
> > > to be sure this is really useful.
> >
> > I would say the problem here is generic enough to apply to all the cases where
> > the send result needs to be checked. Since the return value of the send API is
> > not the real send result, any users who believe that this blocking send API
> > will return the real send result could fall for that. For example, users may
> > think the send was successful even though it was not actually. I believe it is
> > uncommon that users have to register a callback solely to get the send result
> > even though they are using the blocking send API already. Also, I guess there
> > is no special reason why only the mailbox send API should work this way among
> > other typical blocking send APIs. For these reasons, this patch makes the send
> > API return the real send result. This way, users will not need to register the
> > redundant callback and I think the return value will align with their common
> > expectation.
> >
> Clients submit a message into the Mailbox subsystem to be sent out to
> the remote side which can happen immediately or later.
> If submission fails, clients get immediately notified. If transmission
> fails (which is now internal to the subsystem) it is reported to the
> client by a callback.
> If the API was called mbox_submit_message (which it actually is)
> instead of mbox_send_message, there would be no confusion.
> We can argue how good/bad the current implementation is, but the fact
> is that it is here. And I am reluctant to cause churn without good
> reason.
> Again, as I said, any, _legal_, setup scenario will help me come over
> my reluctance.

mbox_send_message() in blocking mode is not only for submitting the message in
the first place if we read through the API docs.

>From the API doc for `mbox_send_message()`:
```
* If the client had set 'tx_block', the call will return
* either when the remote receives the data or when 'tx_tout' millisecs
* run out.
```

>From the API doc for `struct mbox_client`:
```
* @tx_block: If the mbox_send_message should block until data is
* transmitted.
```

With the docs, I think it is apparent that the API covers "transmission" of the
message, not only submission of it. If sumbitting is the sole purpose of the
API, what does the API block for in the first place? We would not need the
blocking mode at all then.

The current return value of the API in failure cases is as follows:

- When submission fails, returns failure.
- When submission succeeds but timeout occurs during transmission, return
failure, i.e. -ETIME.
- When submission succeeds but transmission fails without timeout, return
success.

The third case looks problematic. This patch is focusing on this. There is also
disparity to handle the failure between timeout(the second case) and
non-timeout(the third case). Why does it not return failure when non-timeout
error occurs during transmission whereas it does when timeout occurs during
transmission? If the API is solely for submission, why does it return failure
instead of success in the second case?

In the third case, the controller(or the client) will inform the mailbox core
of the transmission failure. Then, why not return that failure as a return
value of the API despite having this information in the core?

An alternative to fixing this issue would be adding the API doc by saying like:

- In blocking mode, the send API will return -ETIME when timeout occurs during
transmission, but it will not return failure but success(since submission
itself was successful before transmission) when other errors occur during
transmission. Users have to register a callback to collect the error code
when non-timeout error occurs.

But, I think we can go away with this unnecessary confusion by fixing the API
just to return the error code when error occurs regardless of whether it is
timeout or not. Then, we could simply say:

- In blocking mode, the send API will return failure when error occurs.

Since this patch is pointing out this anomaly of the send API's behavior, I
am not sure what scenario we would need more. In other words, the current way
of working would be more surprising to the users than the fixed version of it
as it was when I found out this issue for the first time.

Do you think that this change will cause any other problem on the client side
than fixing the existing issue? If not, I am wondering why not go fix the
issue with this patch.

Thanks,
Joonwon Kang