Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
From: Philipp Stanner
Date: Tue Feb 17 2026 - 09:29:01 EST
On Tue, 2026-02-17 at 15:22 +0100, Christian König wrote:
> On 2/17/26 15:09, Alice Ryhl wrote:
> > On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <phasta@xxxxxxxxxxx> wrote:
> > > > > >
> > > > > >
[…]
> > > > > > Thinking more about it you should probably enforce that there is only
> > > > > > one signaling path for each fence signaling.
> > > > >
> > > > > I'm not really convinced by this.
> > > > >
> > > > > First, the timeout path must be a fence signalling path because the
> > > > > reason you have a timeout in the first place is because the hw might
> > > > > never signal the fence. So if the timeout path deadlocks on a
> > > > > kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
> > > >
> > > > Mhm, good point. On the other hand the timeout handling should probably be considered part of the normal signaling path.
> > >
> > >
> > > Why would anyone want to allocate in a timeout path in the first place – especially for jobqueue?
> > >
> > > Timeout -> close the associated ring. Done.
> > > JobQueue will signal the done_fences with -ECANCELED.
> > >
> > > What would the driver want to allocate in its timeout path, i.e.: timeout callback.
> >
> > Maybe you need an allocation to hold the struct delayed_work_struct
> > field that you use to enqueue the timeout?
>
> And the workqueue were you schedule the delayed_work on must have the reclaim bit set.
>
> Otherwise it can be that the workqueue finds all kthreads busy and tries to start a new one, e.g. allocating task structure......
OK, maybe I'm lost, but what delayed_work?
The jobqueue's delayed work item gets either created on JQ::new() or in
jq.submit_job(). Why would anyone – that is: any driver – implement a
delayed work in its timeout callback?
That doesn't make sense.
JQ notifies the driver from its delayed_work through
timeout_callback(), and in that callback the driver closes the
associated firmware ring.
And it drops the JQ. So it is gone. A new JQ will get a new timeout
work item.
That's basically all the driver must ever do. Maybe some logging and
stuff.
With firmware scheduling it should really be that simple.
And signalling / notifying userspace gets done by jobqueue.
Right?
>
> You also potentially want device core dumps. Those usually use GFP_NOWAIT so that they can't cycle back and wait for some fence. The down side is that they can trivially fail under even light memory pressure.
Simply logging into dmesg should do the trick, shouldn't it?
P.