Re: [PATCH v1 RESEND 4/4] drm/tyr: add GPU reset handling

From: Daniel Almeida

Date: Fri Apr 10 2026 - 09:04:44 EST

>>
>> I think that there are two things we're trying to implement correctly:
>>
>> 1) Deny access to a subset of the state while a reset is in progress
>> 2) Wait for anyone accessing 1) to finish before starting a reset
>>
>> IIUC, using Atomic<T> can solve 1 by bailing if the "reset in progress"
>> flag/variant is set, but I don't think it implements 2? One would have to
>> implement more logic to block until the state is not being actively used.
>>
>> Now, there are probably easier ways to solve this, but I propose that we do the
>> extra legwork to make this explicit and enforceable by the type system.
>>
>> How about introducing a r/w semaphore abstraction? It seems to correctly encode
>> the logic we want:
>>
>> a) multiple users can access the state if no reset is pending ("read" side)
>> b) the reset code can block until the state is no longer being accessed (the "write" side)
>>
>> In Tyr, this would roughly map to something like:
>>
>> struct TyrData {
>> reset_gate: RwSem<ActiveHwState>
>> // other, always accessible members
>> }
>>
>> impl TyrData {
>> fn try_access(&self) -> Option<ReadGuard<'_, ActiveHwState>> {...}
>> }
>>
>> Where ActiveHwState contains the fw/mmu/sched blocks (these are not upstream
>> yet, Deborah has a series that will introduce the fw block that should land
>> soon) and perhaps more.
>>
>> Now, the reset logic would be roughly:
>>
>> fn reset_work(tdev: Arc<TyrDevice>) {
>> // Block until nobody else is accessing the hw, prevent others from
>> // initiating new accesses too..
>> let _guard = tdev.reset_gate.write();
>>
>> // pre_reset() all Resettable implementors
>>
>> ... reset
>>
>> // post_Reset all Resettable implementors
>> }
>>
>> Now, for every block that might touch a resource that would be unavailable
>> during a reset, we enforce a try_access() via the type system, and ensure that
>> the reset cannot start while the guard is alive. In particular, ioctls would
>> look like:
>>
>> fn ioctl_foo(...) {
>> let hw = tdev.reset_gate.try_access()?;
>> // resets are blocked while the guard is alive, no other way to access that state otherwise
>> }
>>
>> The code will not compile otherwise, so long as we keep the state in ActiveHwState, i.e.:
>> protected by the r/w sem.
>>
>> This looks like an improvement over Panthor, since Panthor relies on manually
>> canceling work that access hw state via cancel_work_sync(), and gating new work
>> submissions on the "reset_in_progress" flag, i.e.:
>>
>> /**
>> * sched_queue_work() - Queue a scheduler work.
>> * @sched: Scheduler object.
>> * @wname: Work name.
>> *
>> * Conditionally queues a scheduler work if no reset is pending/in-progress.
>> */
>> #define sched_queue_work(sched, wname) \
>> do { \
>> if (!atomic_read(&(sched)->reset.in_progress) && \
>> !panthor_device_reset_is_pending((sched)->ptdev)) \
>> queue_work((sched)->wq, &(sched)->wname ## _work); \
>> } while (0)
>>
>>
>> Thoughts?
>
> I don't think a semaphore is the right answer. I think SRCU plus a
> counter makes more sense. (With a sentinal value reserved for "currently
> resetting".)

Can you expand a bit on why you think r/w semaphores are not the answer? They
seem to correctly encode the logic we want.

>
> When you begin using the hardware, you start an srcu critical region and
> read the counter. If the counter has the sentinel value, you know the
> hardware is resetting and you fail. Otherwise you record the couter and
> proceed.
>
> If at any point you release the srcu critical region and want to
> re-acquire it to continue the same ongoing work, then you must ensure
> that the counter still has the same value. This ensures that if the GPU
> is reset, then even if the reset has finished by the time you come back,
> you still fail because the counter has changed.

We don't want to "come back”, anything that is in-flight must complete, i.e.:
the reset logic must wait for in-flight jobs, because the work has already been
dispatched to the hardware. We do want to forbid more work from being queued
while a reset is pending, though.

>
>
> Another option could be to rely on the existing Device unbind logic.
> Entirely remove the class device and run the full unbind procedure,
> cleaning up all devm work etc. Then once that has finished, probe the
> device again and start from scratch. After all, GPU reset does not have
> to be a cheap operation.
>
> Alice

What about all resources associated with this class device? Userspace would be
holding handles to resources that are effectively gone.

In any case, I'm not trying to push this r/w semaphore thing as "the right
thing", and if the "srcu + counter" idea turns out to be better, great! :)

The problem is what Boris said recently, though: there's a delay between the
point of "we need a reset" and the reset worker being scheduled.

In the design I proposed, there would be a window of time in which:

1) the hardware is in an invalid state, and must be reset
2) the reset worker hasn't run yet, so the "write" lock has not been taken
3) due to 2, the submit() logic sees the hardware as available, and queues more work
4) the work from 3 now gets to execute even though the hardware is crashed

— Daniel