Re: [PATCH] dmaengine: xilinx_dma: Fix CPU stall in xilinx_dma_poll_timeout

From: Gupta, Suraj

Date: Wed Apr 01 2026 - 07:32:20 EST




On 4/1/2026 1:57 PM, Alex Bereza wrote:
[You don't often get email from alex@bereza.email. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Wed Apr 1, 2026 at 7:23 AM CEST, Suraj Gupta wrote:

Rename XILINX_DMA_LOOP_COUNT to XILINX_DMA_POLL_TIMEOUT_US because the
former is incorrect. It is a timeout value for polling various register
bits in microseconds. It is not a loop count. Add a constant
XILINX_DMA_POLL_DELAY_US for delay_us value.

Please split this change in a new patch.

Ok, will send a v2.

Fixes: 7349a69cf312 ("iopoll: Do not use timekeeping in read_poll_timeout_atomic()")

This patch doesn't fixes anything in iopoll, please use correct fixes tag.

Ok, but I'm not sure what would be the correct fixes tag then? I though I need to reference
7349a69cf312 in fixes tag because this is the actual change that surfaced the CPU stall issue that I
want to fix in this driver. I'm fixing the call sites of xilinx_dma_poll_timeout but they were added
in different commits. Should I add all of them? That would be the following then:

Fixes: 9495f2648287 ("dmaengine: xilinx_vdma: Use readl_poll_timeout instead of do while loop's")
Fixes: 676f9c26c330 ("dmaengine: xilinx: fix device_terminate_all() callback for AXI CDMA")

Three call sites with delay_us=0 were first introduced by 9495f2648287, then 676f9c26c330 added the
fourth call site when introducing xilinx_cdma_stop_transfer (probably copy paste from
xilinx_dma_stop_transfer). Would adding these two fixes tags be correct?

Hi, in addition to this patch I also have a question: what is the point
of atomically polling for the HALTED or IDLE bit in the stop_transfer
functions? Does device_terminate_all really need to be callable from
atomic context? If not, one could switch to polling non-atomically and
avoid burning CPU cycles.


dmaengine_terminate_async(), which directly calls device_terminate_all
can be called from atomic context.

Right, thanks! Just for my understanding: I still think there is potential for improvement, because
from my understanding it would be beneficial to do the waiting for the bits in the status register
and the freeing of descriptors in xilinx_dma_synchronize. Do I understand correctly that this is
currently not possible due to how the DMA engine API is structured? To make this possible I think
the deprecated dmaengine_terminate_all would have to be removed and all users of this API would have
to be adapted accordingly, correct? So this would be a patch of much larger scope than xilinx_dma
driver alone.

Yes, your understanding of xilinx_dma_synchronize()and proposed changes looks correct.

Regards,
Suraj