Re: [PATCH v1 RESEND 4/4] drm/tyr: add GPU reset handling

From: Daniel Almeida

Date: Fri Apr 03 2026 - 11:10:23 EST

> On 19 Mar 2026, at 08:08, Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote:
>
> On Fri, 13 Mar 2026 12:16:44 +0300
> Onur Özkan <work@xxxxxxxxxxxxx> wrote:
>
>> +impl Controller {
>> + /// Creates a [`Controller`] instance.
>> + fn new(pdev: ARef<platform::Device>, iomem: Arc<Devres<IoMem>>) -> Result<Arc<Self>> {
>> + let wq = workqueue::OrderedQueue::new(c"tyr-reset-wq", 0)?;
>> +
>> + Arc::pin_init(
>> + try_pin_init!(Self {
>> + pdev,
>> + iomem,
>> + pending: Atomic::new(false),
>> + wq,
>> + work <- kernel::new_work!("tyr::reset"),
>> + }),
>> + GFP_KERNEL,
>> + )
>> + }
>> +
>> + /// Processes one scheduled reset request.
>> + ///
>> + /// Panthor reference:
>> + /// - drivers/gpu/drm/panthor/panthor_device.c::panthor_device_reset_work()
>> + fn reset_work(self: &Arc<Self>) {
>> + dev_info!(self.pdev.as_ref(), "GPU reset work is started.\n");
>> +
>> + // SAFETY: `Controller` is part of driver-private data and only exists
>> + // while the platform device is bound.
>> + let pdev = unsafe { self.pdev.as_ref().as_bound() };
>> + if let Err(e) = run_reset(pdev, &self.iomem) {
>> + dev_err!(self.pdev.as_ref(), "GPU reset failed: {:?}\n", e);
>> + } else {
>> + dev_info!(self.pdev.as_ref(), "GPU reset work is done.\n");
>> + }
>
> Unfortunately, the reset operation is not as simple as instructing the
> GPU to reset, it's a complex synchronization process where you need to
> try to gracefully put various components on hold before you reset, and
> then resume those after the reset is effective. Of course, with what we
> currently have in-tree, there's not much to suspend/resume, but I think
> I'd prefer to design the thing so we can progressively add more
> components without changing the reset logic too much.
>
> I would probably start with a Resettable trait that has the
> {pre,post}_reset() methods that exist in Panthor.
>
> The other thing we need is a way for those components to know when a
> reset is about to happen so they can postpone some actions they were
> planning in order to not further delay the reset, or end up with
> actions that fail because the HW is already unusable. Not too sure how
> we want to handle that though. Panthor is currently sprinkled with
> panthor_device_reset_is_pending() calls in key places, but that's still
> very manual, maybe we can automate that a bit more in Tyr, dunno.
>

We could have an enum where one of the variants is Resetting, and the other one
gives access to whatever state is not accessible while resets are in progress.

Something like

pub enum TyrData {
Active(ActiveTyrData),
ResetInProgress(ActiveTyrData)
}

fn access() -> Option<&ActiveTyrData> {
… // if the “ResetInProgress” variant is active, return None
}

>> +
>> + self.pending.store(false, Release);
>> + }
>> +}