[PATCH] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW

From: Jacek Lawrynowicz
Date: Tue Apr 08 2025 - 06:07:01 EST


From: Karol Wachowski <karol.wachowski@xxxxxxxxx>

commit dad945c27a42dfadddff1049cf5ae417209a8996 upstream.

Trigger recovery of the NPU upon receiving HW context violation from
the firmware. The context violation error is a fatal error that prevents
any subsequent jobs from being executed. Without this fix it is
necessary to reload the driver to restore the NPU operational state.

This is simplified version of upstream commit as the full implementation
would require all engine reset/resume logic to be backported.

Signed-off-by: Karol Wachowski <karol.wachowski@xxxxxxxxx>
Signed-off-by: Maciej Falkowski <maciej.falkowski@xxxxxxxxxxxxxxx>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@xxxxxxxxxxxxxxx>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@xxxxxxxxxxxxxxx>
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-13-maciej.falkowski@xxxxxxxxxxxxxxx
Fixes: 0adff3b0ef12 ("accel/ivpu: Share NPU busy time in sysfs")
Cc: <stable@xxxxxxxxxxxxxxx> # v6.11+
---
drivers/accel/ivpu/ivpu_job.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/accel/ivpu/ivpu_job.c b/drivers/accel/ivpu/ivpu_job.c
index be2e2bf0f43f0..70b3676974407 100644
--- a/drivers/accel/ivpu/ivpu_job.c
+++ b/drivers/accel/ivpu/ivpu_job.c
@@ -482,6 +482,8 @@ static struct ivpu_job *ivpu_job_remove_from_submitted_jobs(struct ivpu_device *
return job;
}

+#define VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW 0xEU
+
static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32 job_status)
{
struct ivpu_job *job;
@@ -490,6 +492,9 @@ static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32
if (!job)
return -ENOENT;

+ if (job_status == VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW)
+ ivpu_pm_trigger_recovery(vdev, "HW context violation");
+
if (job->file_priv->has_mmu_faults)
job_status = DRM_IVPU_JOB_STATUS_ABORTED;

--
2.45.1