Re: [PATCH 1/2] ASoC: SOF: Replace IPC TX busy deferral with bounded retry
From: Péter Ujfalusi
Date: Mon Feb 16 2026 - 07:40:37 EST
On 14/02/2026 08:40, Cole Leavitt wrote:
> The SOF IPC4 platform send_msg functions (hda_dsp_ipc4_send_msg,
> mtl_ipc_send_msg, cnl_ipc4_send_msg) previously stored the message in
> delayed_ipc_tx_msg and returned 0 when the TX register was busy. The
> deferred message was supposed to be dispatched from the IRQ handler
> when the DSP acknowledged the previous message.
>
> This mechanism silently drops messages during D0i3 power transitions
> because the IRQ handler never fires while the DSP is in a low-power
> state. The caller then hangs in wait_event_timeout() for up to 500ms
> per IPC chunk, causing multi-second audio stalls under CPU load.
I do wonder how this can happen as we only send IPC messages when the fw
has booted up and thus the fw should be replying to messages.
The delayed message handling meant to handle the case when the firmware
sent the reply already, but the TX doorbell is not cleared, FW is not
yet ready to receive a new meesage (or if we send it might be lost) or
we will never send such message (and send it after the next firmware boot?).
see:
47772f905cd8 ("ASoC: SOF: Intel: ipc4: Wait for channel to be free
before sending a message")
I agree that rapid IPC sending attempt would drop messages and will only
send the last one. This would be visible with:
https://github.com/thesofproject/linux/pull/5521
> Fix this by making the platform send_msg functions return -EBUSY
> immediately when the TX register is busy (safe since they execute
> under spin_lock_irq in sof_ipc_send_msg), and adding a bounded retry
> loop with usleep_range() in ipc4_tx_msg_unlocked() which only holds
> the tx_mutex (a sleepable context). The retry loop attempts up to 50
> iterations with 100-200us delays, bounding the maximum busy-wait to
> approximately 10ms instead of the previous 500ms timeout.
>
> Also remove the now-dead delayed_ipc_tx_msg field from
> sof_intel_hda_dev, the dispatch code, and the ack_received tracking
> variable from all three IRQ thread handlers (hda_dsp_ipc4_irq_thread,
> mtl_ipc_irq_thread, cnl_ipc4_irq_thread).
>
> Signed-off-by: Cole Leavitt <cole@xxxxxxxxx>
> ---
...
diff --git a/sound/soc/sof/intel/mtl.c b/sound/soc/sof/intel/mtl.c
> index 095dcf1a18e4..24dec128f589 100644
> --- a/sound/soc/sof/intel/mtl.c
> +++ b/sound/soc/sof/intel/mtl.c
> @@ -101,12 +101,8 @@ static int mtl_ipc_send_msg(struct snd_sof_dev *sdev, struct snd_sof_ipc_msg *ms
> struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
> struct sof_ipc4_msg *msg_data = msg->msg_data;
>
> - if (hda_ipc4_tx_is_busy(sdev)) {
> - hdev->delayed_ipc_tx_msg = msg;
> - return 0;
> - }
> -
> - hdev->delayed_ipc_tx_msg = NULL;
> + if (hda_ipc4_tx_is_busy(sdev))
> + return -EBUSY;
>
> /* send the message via mailbox */
> if (msg_data->data_size)
> @@ -559,7 +555,6 @@ static irqreturn_t mtl_ipc_irq_thread(int irq, void *context)
> {
> struct sof_ipc4_msg notification_data = {{ 0 }};
> struct snd_sof_dev *sdev = context;
> - bool ack_received = false;
> bool ipc_irq = false;
> u32 hipcida;
> u32 hipctdr;
> @@ -576,7 +571,6 @@ static irqreturn_t mtl_ipc_irq_thread(int irq, void *context)
> mtl_ipc_dsp_done(sdev);
>
> ipc_irq = true;
> - ack_received = true;
> }
>
> if (hipctdr & MTL_DSP_REG_HFIPCXTDR_BUSY) {
> @@ -628,13 +622,6 @@ static irqreturn_t mtl_ipc_irq_thread(int irq, void *context)
> dev_dbg_ratelimited(sdev->dev, "nothing to do in IPC IRQ thread\n");
> }
>
> - if (ack_received) {
> - struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
> -
> - if (hdev->delayed_ipc_tx_msg)
> - mtl_ipc_send_msg(sdev, hdev->delayed_ipc_tx_msg);
> - }
> -
> return IRQ_HANDLED;
> }
>
> diff --git a/sound/soc/sof/ipc4.c b/sound/soc/sof/ipc4.c
> index a4a090e6724a..2e24308ef9cc 100644
> --- a/sound/soc/sof/ipc4.c
> +++ b/sound/soc/sof/ipc4.c
> @@ -365,20 +365,33 @@ static int ipc4_wait_tx_done(struct snd_sof_ipc *ipc, void *reply_data)
> return ret;
> }
>
> +#define SOF_IPC4_TX_BUSY_RETRIES 50
> +#define SOF_IPC4_TX_BUSY_DELAY_US 100
> +#define SOF_IPC4_TX_BUSY_DELAY_MAX_US 200
> +
> static int ipc4_tx_msg_unlocked(struct snd_sof_ipc *ipc,
> void *msg_data, size_t msg_bytes,
> void *reply_data, size_t reply_bytes)
> {
> struct sof_ipc4_msg *ipc4_msg = msg_data;
> struct snd_sof_dev *sdev = ipc->sdev;
> - int ret;
> + int ret, i;
>
> if (msg_bytes > ipc->max_payload_size || reply_bytes > ipc->max_payload_size)
> return -EINVAL;
>
> sof_ipc4_log_header(sdev->dev, "ipc tx ", msg_data, true);
>
> - ret = sof_ipc_send_msg(sdev, msg_data, msg_bytes, reply_bytes);
> + for (i = 0; i < SOF_IPC4_TX_BUSY_RETRIES; i++) {
> + ret = sof_ipc_send_msg(sdev, msg_data, msg_bytes, reply_bytes);
> + if (ret != -EBUSY)
> + break;
> + usleep_range(SOF_IPC4_TX_BUSY_DELAY_US,
> + SOF_IPC4_TX_BUSY_DELAY_MAX_US);
> + }
The reason why I ended up with the dead simple delay msg handling which
sends the message when the ack is received rigth away is to avoid delays
without a need of a busy loop.
Before that I had similar approach, but dropped it due to the delay it
introduced on the message sending.
The delayed message combined with IPC timeouts are an interesting
problem to tackle, the PR I mentioned:
https://github.com/thesofproject/linux/pull/5521
plus this patch
https://github.com/ujfalusi/sof-linux/commit/e6c6ca613e60c477dcc025207f9732c8ae4a1b33
worked locally for most of the time, but I give that I have not faced
with the issue that I have lost IPCs.
I'm not sure if the two approces can be somehow combined.
Receving the reply to the message does not mean that the FW can receive
a new message, but from the kernel pow the IPC sequence was done (when
we receive out of sync ACK, the reply data might not be valid anymore).
We cannot make the IPC completion based on the ACK as I have seen in
debug logs all permutations: ACK then REPLY, ACK and REPLAY at the same
time, REPLY followed by ACK.
Having said that, I think this is still a bit safer to not loose
messages, 200us is close to polling.
> + if (i == SOF_IPC4_TX_BUSY_RETRIES)
> + dev_dbg(sdev->dev, "%s: TX still busy after %d retries\n",
> + __func__, i);
no need to print the __func_ it is added by the infra.
Can you add
dev_dbg(sdev->dev, "message sending delayed by %d loops for %#x|%#x\n",
i, ipc4_msg->primary, ipc4_msg->extension);
when the message got delayed due to EBUSY?
> if (ret) {
> dev_err_ratelimited(sdev->dev,
> "%s: ipc message send for %#x|%#x failed: %d\n",
--
Péter