Re: [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API

From: Onur Özkan

Date: Fri Apr 03 2026 - 08:40:52 EST


> This series adds GPU reset handling support for Tyr in a new module
> drivers/gpu/drm/tyr/driver.rs which encapsulates the low-level reset
> controller internals and exposes a ResetHandle API to the driver.
>
> The reset module owns reset state, queueing and execution ordering
> through OrderedQueue and handles duplicate/concurrent reset requests
> with a pending flag.
>
> Apart from the reset module, the first 3 patches:
>
> - Fixes a potential reset-complete stale state bug by clearing completed
> state before doing soft reset.
> - Adds Work::disable_sync() (wrapper of bindings::disable_work_sync).
> - Adds OrderedQueue support.
>
> Runtime tested on hardware by Deborah Brouwer (see [1]) and myself.
>
> [1]: https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/63#note_3364131
>
> Link: https://gitlab.freedesktop.org/panfrost/linux/-/issues/28
> ---
>
> Onur Ã?zkan (4):
> drm/tyr: clear reset IRQ before soft reset
> rust: add Work::disable_sync
> rust: add ordered workqueue wrapper
> drm/tyr: add GPU reset handling
>
> drivers/gpu/drm/tyr/driver.rs | 38 +++----
> drivers/gpu/drm/tyr/reset.rs | 180 ++++++++++++++++++++++++++++++++++
> drivers/gpu/drm/tyr/tyr.rs | 1 +
> rust/helpers/workqueue.c | 6 ++
> rust/kernel/workqueue.rs | 62 ++++++++++++
> 5 files changed, 260 insertions(+), 27 deletions(-)
> create mode 100644 drivers/gpu/drm/tyr/reset.rs
>
>
> base-commit: 0ccc0dac94bf2f5c6eb3e9e7f1014cd9dddf009f
> --
> 2.51.2
>

Hi all,

Writing the current status of this work, I have 2 blockers to move forward.

1- GPU unplug API

On the existing C side, reset failure handling eventually needs to unplug the
device, and that path is part of the broader reset flow in:

- srctree/drivers/gpu/drm/panthor/panthor_device.c

This is part of [1] and as far as I understand, it is still work in progress. For Tyr,
I currently keep this as a placeholder (todo!("unplug the GPU")) in the reset path,
because I do not want to introduce temporary or partial unplug handling in this series
before the unplug design is settled.

[1]: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/29

2- Design decisions for reset handling

The second blocker is the design around how Resettable (a generic pre_reset post_reset hook trait)
implemeter should stop admitting new work, drain in-flight operations and recover after reset.

My current understanding is that the cleanest approach is to keep reset.rs responsible only for
reset orchestration:

- schedule reset work
- call pre_reset() hooks
- perform the hardware reset
- call post_reset() hooks
- propagate failure.

Then, each Resettable implementer should own its local recovery logic.

This is also how the existing C implementation is structured. The reset worker is centralized, but
recovery is implemented by the participating subsystems:

- srctree/drivers/gpu/drm/panthor/panthor_sched.c
- srctree/drivers/gpu/drm/panthor/panthor_fw.c
- srctree/drivers/gpu/drm/panthor/panthor_mmu.c

More specifically, the existing C side has hooks such as:

- panthor_sched_pre_reset() / panthor_sched_post_reset()
- panthor_fw_pre_reset() / panthor_fw_post_reset()
- panthor_mmu_pre_reset() / panthor_mmu_post_reset()

The reason I am leaning in the same direction for Tyr is that "stop new work", "drain" and "resume"
are not generic operations. They depend on the implementer.

Because of that, I think reset.rs should not have a global guard/checking API for all of this.

Comments and suggestions are very welcome.

Regards,
Onur