Re: [PATCH v2] ASoC: SOF: Replace IPC TX busy deferral with bounded retry
From: Péter Ujfalusi
Date: Thu Feb 19 2026 - 02:11:14 EST
On 17/02/2026 23:49, Cole Leavitt wrote:
> The SOF IPC4 platform send_msg functions (hda_dsp_ipc4_send_msg,
> mtl_ipc_send_msg, cnl_ipc4_send_msg) previously stored the message in
> delayed_ipc_tx_msg and returned 0 when the TX register was busy. The
> deferred message was supposed to be dispatched from the IRQ handler
> when the DSP acknowledged the previous message.
>
> This mechanism silently drops messages during D0i3 power transitions
> because the IRQ handler never fires while the DSP is in a low-power
> state. The caller then hangs in wait_event_timeout() for up to 500ms
> per IPC chunk, causing multi-second audio stalls under CPU load.
I think the agent get this a bit wrong and there is a cause effect mixup.
> Fix this by making the platform send_msg functions return -EBUSY
> immediately when the TX register is busy (safe since they execute
> under spin_lock_irq in sof_ipc_send_msg), and adding a bounded retry
> loop with usleep_range() in ipc4_tx_msg_unlocked() which only holds
> the tx_mutex (a sleepable context). The retry loop attempts up to 50
> iterations with 100-200us delays, bounding the maximum busy-wait to
> approximately 10ms instead of the previous 500ms timeout.
>
> Also remove the now-dead delayed_ipc_tx_msg field from
> sof_intel_hda_dev, the dispatch code, and the ack_received tracking
> variable from all three IRQ thread handlers (hda_dsp_ipc4_irq_thread,
> mtl_ipc_irq_thread, cnl_ipc4_irq_thread).
No messages were dropped, but if the firmware locks up during suspend
then we might enter low power while an IPC is delayed and send_msg is
wating for a reply (or timeout).
Yes, irq will not come, but it won't came even of the system would not
be on it's way to suspend.
The delayed handling as it is now is OK, it never looses messages,
everything is linear, it just takes a long time to go through several
messages when each of them times out because the fw is locked up.
In essence this patch reduces the 500ms default IPC timeout to 5-10ms
after an IPC timeout, levaing the FW less time to recover and not wating
for a reply.
It can also introduce a new race: if the FW clears the BUSY first and
then sends the reply and we were 'spinning' to send the next message we
might do so before receiving the reply to the previous message.
Which is fair, I think, but the commit message should be clear on this.
Please can you file the issue for sof/linux as I have asked with more
information? We had similar issues 2-3 years ago, but they were root
caused and fixed.
I'll need to think about this a bit more...
one commnet for ipc4.c
>
> Signed-off-by: Cole Leavitt <cole@xxxxxxxxx>
> ---
> Changes in v2:
> - Removed __func__ from debug prints (dyndbg adds it automatically)
> - Added dev_dbg() when message sending is delayed due to EBUSY
> - Dropped patch 2/2 (dai_link_hw_ready) per Pierre's feedback
>
> diff --git a/sound/soc/sof/ipc4.c b/sound/soc/sof/ipc4.c
> index a4a090e6724a..ad99e2e07b66 100644
> --- a/sound/soc/sof/ipc4.c
> +++ b/sound/soc/sof/ipc4.c
> @@ -365,20 +365,36 @@ static int ipc4_wait_tx_done(struct snd_sof_ipc *ipc, void *reply_data)
> return ret;
> }
>
> +#define SOF_IPC4_TX_BUSY_RETRIES 50
> +#define SOF_IPC4_TX_BUSY_DELAY_US 100
> +#define SOF_IPC4_TX_BUSY_DELAY_MAX_US 200
> +
> static int ipc4_tx_msg_unlocked(struct snd_sof_ipc *ipc,
> void *msg_data, size_t msg_bytes,
> void *reply_data, size_t reply_bytes)
> {
> struct sof_ipc4_msg *ipc4_msg = msg_data;
> struct snd_sof_dev *sdev = ipc->sdev;
> - int ret;
> + int ret, i;
>
> if (msg_bytes > ipc->max_payload_size || reply_bytes > ipc->max_payload_size)
> return -EINVAL;
>
> sof_ipc4_log_header(sdev->dev, "ipc tx ", msg_data, true);
>
> - ret = sof_ipc_send_msg(sdev, msg_data, msg_bytes, reply_bytes);
> + for (i = 0; i < SOF_IPC4_TX_BUSY_RETRIES; i++) {
> + ret = sof_ipc_send_msg(sdev, msg_data, msg_bytes, reply_bytes);
> + if (ret != -EBUSY)
> + break;
> + usleep_range(SOF_IPC4_TX_BUSY_DELAY_US,
> + SOF_IPC4_TX_BUSY_DELAY_MAX_US);
> + }
> + if (i == SOF_IPC4_TX_BUSY_RETRIES) {
> + dev_dbg(sdev->dev, "ipc tx failed: TX busy after %d retries\n", i);
this needs special treatment with unique error that can be used for
debugging purposes, something like:
dev_err(sdev->dev, "IPC busy, msg %#x|%#x cannot be sent\n",
ipc4_msg->primary, ipc4_msg->extension);
snd_sof_handle_fw_exception(ipc->sdev, "IPC busy");
return ret;
> + } else if (i > 0) {
> + dev_dbg(sdev->dev, "ipc tx delayed by %d loops for %#x|%#x\n",
> + i, ipc4_msg->primary, ipc4_msg->extension);
> + }
> if (ret) {
> dev_err_ratelimited(sdev->dev,
> "%s: ipc message send for %#x|%#x failed: %d\n",
--
Péter