Re: [PATCH v1 RESEND 4/4] drm/tyr: add GPU reset handling
From: Boris Brezillon
Date: Thu Mar 19 2026 - 07:08:55 EST
On Fri, 13 Mar 2026 12:16:44 +0300
Onur Özkan <work@xxxxxxxxxxxxx> wrote:
> +impl Controller {
> + /// Creates a [`Controller`] instance.
> + fn new(pdev: ARef<platform::Device>, iomem: Arc<Devres<IoMem>>) -> Result<Arc<Self>> {
> + let wq = workqueue::OrderedQueue::new(c"tyr-reset-wq", 0)?;
> +
> + Arc::pin_init(
> + try_pin_init!(Self {
> + pdev,
> + iomem,
> + pending: Atomic::new(false),
> + wq,
> + work <- kernel::new_work!("tyr::reset"),
> + }),
> + GFP_KERNEL,
> + )
> + }
> +
> + /// Processes one scheduled reset request.
> + ///
> + /// Panthor reference:
> + /// - drivers/gpu/drm/panthor/panthor_device.c::panthor_device_reset_work()
> + fn reset_work(self: &Arc<Self>) {
> + dev_info!(self.pdev.as_ref(), "GPU reset work is started.\n");
> +
> + // SAFETY: `Controller` is part of driver-private data and only exists
> + // while the platform device is bound.
> + let pdev = unsafe { self.pdev.as_ref().as_bound() };
> + if let Err(e) = run_reset(pdev, &self.iomem) {
> + dev_err!(self.pdev.as_ref(), "GPU reset failed: {:?}\n", e);
> + } else {
> + dev_info!(self.pdev.as_ref(), "GPU reset work is done.\n");
> + }
Unfortunately, the reset operation is not as simple as instructing the
GPU to reset, it's a complex synchronization process where you need to
try to gracefully put various components on hold before you reset, and
then resume those after the reset is effective. Of course, with what we
currently have in-tree, there's not much to suspend/resume, but I think
I'd prefer to design the thing so we can progressively add more
components without changing the reset logic too much.
I would probably start with a Resettable trait that has the
{pre,post}_reset() methods that exist in Panthor.
The other thing we need is a way for those components to know when a
reset is about to happen so they can postpone some actions they were
planning in order to not further delay the reset, or end up with
actions that fail because the HW is already unusable. Not too sure how
we want to handle that though. Panthor is currently sprinkled with
panthor_device_reset_is_pending() calls in key places, but that's still
very manual, maybe we can automate that a bit more in Tyr, dunno.
> +
> + self.pending.store(false, Release);
> + }
> +}