Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer

From: Daniel Almeida

Date: Tue Mar 17 2026 - 08:10:31 EST

Matthew,

> I get it — you’re a Rust zealot. You can do this in C and enforce the
> rules quite well.
>
> RAII cannot describe ownership transfers of refs, nor can it express who
> owns what in multi-threaded components, as far as I know. Ref-tracking
> and ownership need to be explicit.
>
> I’m not going to reply to Rust vs C comments in this thread. If you want
> to talk about ownership, lifetimes, dma-fence enforcement, and teardown
> guarantees, sure.
>
> If you want to build on top of a component that’s been tested on a
> production driver, great — please join in. If you want to figure out all
> the pitfalls yourself, well… have fun.
>
> Matt
>

It is not about being a Rust zealot. I pointed out that your code has issues.
Every time you access the queue you have to use a special function because the
queue might be gone, how is this not a problem?

+ * However, there is a secondary hazard: a worker can be queued while the
+ * queue is in a "zombie" state — refcount has already reached zero and async
+ * teardown is in flight, but the work item has not yet been disabled by
+ * free_work. To guard against this every worker uses
+ * drm_dep_queue_get_unless_zero() at entry; if the refcount is already zero
+ * the worker bails immediately without touching the queue state.

At various points you document requirements that are simply comments. Resource
management is scattered all over the place, and it’s sometimes even shared
with drivers, whom you have no control over.

+ * Drivers that set %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED and wish to
+ * serialise their own submit work against the bypass path must acquire this
+ * guard. Without it, a concurrent caller of drm_dep_job_push() could take
+ * the bypass path and call ops->run_job() inline between the driver's
+ * eligibility check and its corresponding action, producing a race.

How is this not a problem? Again, you’re not in control of driver code.

+ * If set, the driver is responsible for freeing the job. If NULL,

Same here.

Even if we take Rust out of the equation, how do you plan to solve these things? Or
do you consider them solved as is?

I worry that we will find ourselves again at XDC in yet another scheduler
workshop to address the issues that will invariably come up with your new
design in a few years.

> If you want to build on top of a component that’s been tested on a
> production driver, great — please join in. If you want to figure out all
> the pitfalls yourself, well… have fun.

Note that I didn’t show up with a low-effort “hey, how about we rewrite
this in Rust?”. Instead, I linked to an actual Rust implementation that I
spent weeks painstakingly debugging, not to mention the time it took to write
it. Again, I suggest that you guys have a look, like I did with your code. You
might find things you end up liking there.

— Daniel