Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions

Next message: Yanan Yang: "[PATCH v5 0/2] Add NXP FRDM-IMX91S board support"
Previous message: Boris Brezillon: "Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions"
In reply to: Danilo Krummrich: "Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions"
Next in thread: Alice Ryhl: "Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Christian König

Date: Tue Feb 10 2026 - 05:47:33 EST

On 2/10/26 11:36, Danilo Krummrich wrote:
> On Tue Feb 10, 2026 at 11:15 AM CET, Alice Ryhl wrote:
>> One way you can see this is by looking at what we require of the
>> workqueue. For all this to work, it's pretty important that we never
>> schedule anything on the workqueue that's not signalling safe, since
>> otherwise you could have a deadlock where the workqueue is executes some
>> random job calling kmalloc(GFP_KERNEL) and then blocks on our fence,
>> meaning that the VM_BIND job never gets scheduled since the workqueue
>> is never freed up. Deadlock.
>
> Yes, I also pointed this out multiple times in the past in the context of C GPU
> scheduler discussions. It really depends on the workqueue and how it is used.
>
> In the C GPU scheduler the driver can pass its own workqueue to the scheduler,
> which means that the driver has to ensure that at least one out of the
> wq->max_active works is free for the scheduler to make progress on the
> scheduler's run and free job work.
>
> Or in other words, there must be no more than wq->max_active - 1 works that
> execute code violating the DMA fence signalling rules.

*And* the workqueue must be created with WQ_MEM_RECLAIM so that work items can also start under memory pressure and not potentially cycle back into the memory management to wait for a dma_fence to signal.

But apart from that your explanation is perfectly correct, yes.

Thanks,
Christian.

> This is also why the JobQ needs its own workqueue and relying on the system WQ
> is unsound.
>
> In case of an ordered workqueue, it is always a potential deadlock to schedule
> work that does non-atomic allocations or takes a lock that is used elsewhere for
> non-atomic allocations of course.