Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions

From: Christian König

Date: Tue Feb 17 2026 - 09:22:21 EST


On 2/17/26 15:09, Alice Ryhl wrote:
> On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <phasta@xxxxxxxxxxx> wrote:
>>
>> On Tue, 2026-02-10 at 16:45 +0100, Christian König wrote:
>>> On 2/10/26 16:07, Alice Ryhl wrote:
>>>> On Tue, Feb 10, 2026 at 02:56:52PM +0100, Christian König wrote:
>>>>> On 2/10/26 14:49, Alice Ryhl wrote:
>>>>>> On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
>>>>>>> On Tue, 10 Feb 2026 13:15:31 +0000
>>>>>>> Alice Ryhl <aliceryhl@xxxxxxxxxx> wrote:
>>>>>>>
>>>>>>>> On Tue, Feb 10, 2026 at 01:36:17PM +0100, Boris Brezillon wrote:
>>>>>>>>> On Tue, 10 Feb 2026 10:15:04 +0000
>>>>>>>>> Alice Ryhl <aliceryhl@xxxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>>> impl MustBeSignalled<'_> {
>>>>>>>>>> /// Drivers generally should not use this one.
>>>>>>>>>> fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
>>>>>>>>>>
>>>>>>>>>> /// One way to ensure the fence has been signalled is to signal it.
>>>>>>>>>> fn signal_fence(self) -> WillBeSignalled {
>>>>>>>>>> self.fence.signal();
>>>>>>>>>> self.i_promise_it_will_be_signalled()
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> /// Another way to ensure the fence will be signalled is to spawn a
>>>>>>>>>> /// workqueue item that promises to signal it.
>>>>>>>>>> fn transfer_to_wq(
>>>>>>>>>> self,
>>>>>>>>>> wq: &Workqueue,
>>>>>>>>>> item: impl DmaFenceWorkItem,
>>>>>>>>>> ) -> WillBeSignalled {
>>>>>>>>>> // briefly obtain the lock class of the wq to indicate to
>>>>>>>>>> // lockdep that the signalling path "blocks" on arbitrary jobs
>>>>>>>>>> // from this wq completing
>>>>>>>>>> bindings::lock_acquire(&wq->key);
>>>>>>>>>> bindings::lock_release(&wq->key);
>>>>>>>>>>
>>>>>>>>>> // enqueue the job
>>>>>>>>>> wq.enqueue(item, wq);
>>>>>>>>>>
>>>>>>>>>> // The signature of DmaFenceWorkItem::run() promises to arrange
>>>>>>>>>> // for it to be signalled.
>>>>>>>>>> self.i_promise_it_will_be_signalled()
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> I guess what's still missing is some sort of `transfer_to_hw()`
>>>>>>>>> function and way to flag the IRQ handler taking over the fence
>>>>>>>>> signaling token.
>>>>>>>>
>>>>>>>> Yes, transfer to hardware needs to be another piece of logic similar to
>>>>>>>> transfer to wq. And I imagine there are many ways such a transfer to
>>>>>>>> hardware could work.
>>>>>>>>
>>>>>>>> Unless you have a timeout on it, in which case the WillBeSignalled is
>>>>>>>> satisfied by the fact you have a timeout alone, and the signalling that
>>>>>>>> happens from the irq is just an opportunistic signal from outside the
>>>>>>>> dma fence signalling critical path.
>>>>>>>
>>>>>>> Yes and no. If it deadlocks in the completion WorkItem because of
>>>>>>> allocations (or any of the forbidden use cases), I think we want to
>>>>>>> catch that, because that's a sign fences are likely to end up with
>>>>>>> timeouts when they should have otherwise been signaled properly.
>>>>>>>
>>>>>>>> Well ... unless triggering timeouts can block on GFP_KERNEL
>>>>>>>> allocations...
>>>>>>>
>>>>>>> I mean, the timeout handler should also be considered a DMA-signalling
>>>>>>> path, and the same rules should apply to it.
>>>>>>
>>>>>> I guess that's fair. Even with a timeout you want both to be signalling
>>>>>> path.
>>>>>>
>>>>>> I guess more generally, if a fence is signalled by mechanism A or B,
>>>>>> whichever happens first, you have the choice between:
>>>>>
>>>>> That doesn't happen in practice.
>>>>>
>>>>> For each fence you only have one signaling path you need to guarantee
>>>>> forward progress for.
>>>>>
>>>>> All other signaling paths are just opportunistically optimizations
>>>>> which *can* signal the fence, but there is no guarantee that they
>>>>> will.
>>>>>
>>>>> We used to have some exceptions to that, especially around aborting
>>>>> submissions, but those turned out to be a really bad idea as well.
>>>>>
>>>>> Thinking more about it you should probably enforce that there is only
>>>>> one signaling path for each fence signaling.
>>>>
>>>> I'm not really convinced by this.
>>>>
>>>> First, the timeout path must be a fence signalling path because the
>>>> reason you have a timeout in the first place is because the hw might
>>>> never signal the fence. So if the timeout path deadlocks on a
>>>> kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
>>>
>>> Mhm, good point. On the other hand the timeout handling should probably be considered part of the normal signaling path.
>>
>>
>> Why would anyone want to allocate in a timeout path in the first place – especially for jobqueue?
>>
>> Timeout -> close the associated ring. Done.
>> JobQueue will signal the done_fences with -ECANCELED.
>>
>> What would the driver want to allocate in its timeout path, i.e.: timeout callback.
>
> Maybe you need an allocation to hold the struct delayed_work_struct
> field that you use to enqueue the timeout?

And the workqueue were you schedule the delayed_work on must have the reclaim bit set.

Otherwise it can be that the workqueue finds all kthreads busy and tries to start a new one, e.g. allocating task structure......

You also potentially want device core dumps. Those usually use GFP_NOWAIT so that they can't cycle back and wait for some fence. The down side is that they can trivially fail under even light memory pressure.

Regards,
Christian.

>
> Alice