Re: nouveau PUSHBUFFER_ERR on 5.9.0-rc2-next-20200824
From: Alexander Kapshuk
Date: Mon Aug 31 2020 - 01:34:06 EST
On Mon, Aug 31, 2020 at 7:30 AM Ben Skeggs <skeggsb@xxxxxxxxx> wrote:
>
> On Tue, 25 Aug 2020 at 17:21, Alexander Kapshuk
> <alexander.kapshuk@xxxxxxxxx> wrote:
> >
> > Since upgrading to linux-next based on 5.9.0-rc1 and 5.9.0-rc2 I have
> > had my mouse pointer disappear soon after logging in, and I have
> > observed the system freezing temporarily when clicking on objects and
> > when typing text.
> > I have also found records of push buffer errors in dmesg output:
> > [ 6625.450394] nouveau 0000:01:00.0: disp: ERROR 1 [PUSHBUFFER_ERR] 02
> > [] chid 0 mthd 0000 data 00000400
> Hey,
>
> Yeah, I'm aware of this. Lyude and I have both seen it, but it's been
> very painful to track down to what's actually causing it so far. It
> likely is the commit you mentioned that's at fault, and I'm still
> working to find a proper solution before I revert it.
>
> Ben.
>
> >
> > I tried setting CONFIG_NOUVEAU_DEBUG=5 (tracing) to try and collect
> > further debug info, but nothing caught the eye.
> >
> > The error message in question comes from nv50_disp_intr_error in
> > drivers/gpu/drm/nouveau/nvkm/engine/disp/nv50.c:613,645.
> > And nv50_disp_intr_error is called from nv50_disp_intr in the
> > following while block:
> > drivers/gpu/drm/nouveau/nvkm/engine/disp/nv50.c:647,658
> > void
> > nv50_disp_intr(struct nv50_disp *disp)
> > {
> > struct nvkm_device *device = disp->base.engine.subdev.device;
> > u32 intr0 = nvkm_rd32(device, 0x610020);
> > u32 intr1 = nvkm_rd32(device, 0x610024);
> >
> > while (intr0 & 0x001f0000) {
> > u32 chid = __ffs(intr0 & 0x001f0000) - 16;
> > nv50_disp_intr_error(disp, chid);
> > intr0 &= ~(0x00010000 << chid);
> > }
> > ...
> > }
> >
> > Could this be in any way related to this series of commits?
> > commit 0a96099691c8cd1ac0744ef30b6846869dc2b566
> > Author: Ben Skeggs <bskeggs@xxxxxxxxxx>
> > Date: Tue Jul 21 11:34:07 2020 +1000
> >
> > drm/nouveau/kms/nv50-: implement proper push buffer control logic
> >
> > We had a, what was supposed to be temporary, hack in the KMS code where we'd
> > completely drain an EVO/NVD channel's push buffer when wrapping to the start
> > again, instead of treating it as a ring buffer.
> >
> > Let's fix that, finally.
> >
> > Signed-off-by: Ben Skeggs <bskeggs@xxxxxxxxxx>
> >
> > Here are my GPU details:
> > 01:00.0 VGA compatible controller: NVIDIA Corporation GT216 [GeForce
> > 210] (rev a1)
> > Subsystem: Micro-Star International Co., Ltd. [MSI] Device 8a93
> > Kernel driver in use: nouveau
> >
> > The last linux-next kernel I built where the problem reported does not
> > manifest itself is 5.8.0-rc6-next-20200720.
> >
> > I would appreciate being given any pointers on how to further debug this.
> > Or is git bisect the only way to proceed with this?
> >
> > Thanks.
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@xxxxxxxxxxxxxxxxxxxxx
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Thanks a lot for getting back to me about this.
Please let me know if there's anything else I can do to help track this down.
Alexander.