Re: [PATCH] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW
From: Jacek Lawrynowicz
Date: Thu Apr 10 2025 - 03:49:57 EST
Hi,
This is an important patch for the Intel NPU.
Is there anything it is missing to be included in stable?
Regards,
Jacek
On 4/8/2025 11:57 AM, Jacek Lawrynowicz wrote:
> From: Karol Wachowski <karol.wachowski@xxxxxxxxx>
>
> commit dad945c27a42dfadddff1049cf5ae417209a8996 upstream.
>
> Trigger recovery of the NPU upon receiving HW context violation from
> the firmware. The context violation error is a fatal error that prevents
> any subsequent jobs from being executed. Without this fix it is
> necessary to reload the driver to restore the NPU operational state.
>
> This is simplified version of upstream commit as the full implementation
> would require all engine reset/resume logic to be backported.
>
> Signed-off-by: Karol Wachowski <karol.wachowski@xxxxxxxxx>
> Signed-off-by: Maciej Falkowski <maciej.falkowski@xxxxxxxxxxxxxxx>
> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@xxxxxxxxxxxxxxx>
> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@xxxxxxxxxxxxxxx>
> Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-13-maciej.falkowski@xxxxxxxxxxxxxxx
> Fixes: 0adff3b0ef12 ("accel/ivpu: Share NPU busy time in sysfs")
> Cc: <stable@xxxxxxxxxxxxxxx> # v6.11+
> ---
> drivers/accel/ivpu/ivpu_job.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/accel/ivpu/ivpu_job.c b/drivers/accel/ivpu/ivpu_job.c
> index be2e2bf0f43f0..70b3676974407 100644
> --- a/drivers/accel/ivpu/ivpu_job.c
> +++ b/drivers/accel/ivpu/ivpu_job.c
> @@ -482,6 +482,8 @@ static struct ivpu_job *ivpu_job_remove_from_submitted_jobs(struct ivpu_device *
> return job;
> }
>
> +#define VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW 0xEU
> +
> static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32 job_status)
> {
> struct ivpu_job *job;
> @@ -490,6 +492,9 @@ static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32
> if (!job)
> return -ENOENT;
>
> + if (job_status == VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW)
> + ivpu_pm_trigger_recovery(vdev, "HW context violation");
> +
> if (job->file_priv->has_mmu_faults)
> job_status = DRM_IVPU_JOB_STATUS_ABORTED;
>