Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

From: Chunyou Tang
Date: Mon Jun 21 2021 - 21:56:09 EST


Hi Steve,
I will send a new patch with suitable subject/commit message.
But I send a V3 or a new patch?

I met a bug about the GPU,I have no idea about how to fix it,
If you can give me some suggestion,it is perfect.

You can see such kernel log:

Jun 20 10:20:13 icube kernel: [ 774.566760] mvp_gpu 0000:05:00.0: GPU
Fault 0x00000088 (SHAREABILITY_FAULT) at 0x000000000310fd00 Jun 20
10:20:13 icube kernel: [ 774.566764] mvp_gpu 0000:05:00.0: There were
multiple GPU faults - some have not been reported Jun 20 10:20:13 icube
kernel: [ 774.667542] mvp_gpu 0000:05:00.0: AS_ACTIVE bit stuck Jun 20
10:20:13 icube kernel: [ 774.767900] mvp_gpu 0000:05:00.0: AS_ACTIVE
bit stuck Jun 20 10:20:13 icube kernel: [ 774.868546] mvp_gpu
0000:05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel:
[ 774.968910] mvp_gpu 0000:05:00.0: AS_ACTIVE bit stuck Jun 20
10:20:13 icube kernel: [ 775.069251] mvp_gpu 0000:05:00.0: AS_ACTIVE
bit stuck Jun 20 10:20:22 icube kernel: [ 783.693971] mvp_gpu
0000:05:00.0: gpu sched timeout, js=1, config=0x7300, status=0x8,
head=0x362c900, tail=0x362c100, sched_job=000000003252fb84

In
https://lore.kernel.org/dri-devel/20200510165538.19720-1-peron.clem@xxxxxxxxx/
there had a same bug like mine,and I found you at the mail list,I don't
know how it fixed?

I need your help!

thinks very much!

Chunyou

ÓÚ Mon, 21 Jun 2021 11:45:20 +0100
Steven Price <steven.price@xxxxxxx> дµÀ:

> On 19/06/2021 04:18, Chunyou Tang wrote:
> > Hi Steve,
> > 1,Now I know how to write the subject
> > 2,the low 8 bits is the exception type in spec.
> >
> > and you can see prnfrost_exception_name()
> >
> > switch (exception_code) {
> > /* Non-Fault Status code */
> > case 0x00: return "NOT_STARTED/IDLE/OK";
> > case 0x01: return "DONE";
> > case 0x02: return "INTERRUPTED";
> > case 0x03: return "STOPPED";
> > case 0x04: return "TERMINATED";
> > case 0x08: return "ACTIVE";
> > ........
> > ........
> > case 0xD8: return "ACCESS_FLAG";
> > case 0xD9 ... 0xDF: return "ACCESS_FLAG";
> > case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
> > case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
> > }
> > return "UNKNOWN";
> > }
> >
> > the exception_code in case is only 8 bits,so if fault_status
> > in panfrost_gpu_irq_handler() don't & 0xFF,it can't get correct
> > exception reason,it will be always UNKNOWN.
>
> Yes, I'm happy with the change - I just need a patch that I can apply.
> At the moment this patch only changes the first '0x%08x' output rather
> than the call to panfrost_exception_name() as well. So we just need a
> patch which does:
>
> - fault_status & 0xFF, panfrost_exception_name(pfdev, fault_status),
> + fault_status, panfrost_exception_name(pfdev, fault_status & 0xFF),
>
> along with a suitable subject/commit message describing the change. If
> you can send me that I can apply it.
>
> Thanks,
>
> Steve
>
> PS. Sorry for going round in circles here - I'm trying to help you get
> setup so you'll be able to contribute patches easily in future. An
> important part of that is ensuring you can send a properly formatted
> patch to the list.
>
> PPS. I'm still not receiving your emails directly. I don't think it's
> a problem at my end because I'm receiving other emails, but if you can
> somehow fix the problem you're likely to receive a faster response.
>
> > ÓÚ Fri, 18 Jun 2021 13:43:24 +0100
> > Steven Price <steven.price@xxxxxxx> дµÀ:
> >
> >> On 17/06/2021 07:20, ChunyouTang wrote:
> >>> From: ChunyouTang <tangchunyou@xxxxxxxxxxxx>
> >>>
> >>> of the low 8 bits.
> >>
> >> Please don't split the subject like this. The first line of the
> >> commit should be a (very short) summary of the patch. Then a blank
> >> line and then a longer description of what the purpose of the
> >> patch is and why it's needed.
> >>
> >> Also you previously had this as part of a series (the first part
> >> adding the "& 0xFF" in the panfrost_exception_name() call). I'm not
> >> sure we need two patches for the single line, but as it stands this
> >> patch doesn't apply.
> >>
> >> Also I'm still not receiving any emails from you directly (only via
> >> the list), so it's possible I might have missed something you sent.
> >>
> >> Steve
> >>
> >>>
> >>> Signed-off-by: ChunyouTang <tangchunyou@xxxxxxxxxxxx>
> >>> ---
> >>> drivers/gpu/drm/panfrost/panfrost_gpu.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> >>> b/drivers/gpu/drm/panfrost/panfrost_gpu.c index
> >>> 1fffb6a0b24f..d2d287bbf4e7 100644 ---
> >>> a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++
> >>> b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -33,7 +33,7 @@ static
> >>> irqreturn_t panfrost_gpu_irq_handler(int irq, void *data) address
> >>> |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO);
> >>> dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s) at
> >>> 0x%016llx\n",
> >>> - fault_status & 0xFF,
> >>> panfrost_exception_name(pfdev, fault_status & 0xFF),
> >>> + fault_status,
> >>> panfrost_exception_name(pfdev, fault_status & 0xFF), address);
> >>>
> >>> if (state & GPU_IRQ_MULTIPLE_FAULT)
> >>>
> >
> >