Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

From: Tobias Klausmann
Date: Wed Nov 19 2014 - 10:29:53 EST


On 19.11.2014 09:10, Maarten Lankhorst wrote:
Hey,

On 19-11-14 07:43, Michael Marineau wrote:
On 3.18-rc kernel's I have been intermittently experiencing GPU
lockups shortly after startup, accompanied with one or both of the
following errors:

nouveau E[ PFIFO][0000:01:00.0] read fault at 0x000734a000 [PTE]
from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
nouveau E[ DRM] GPU lockup - switching to software fbcon

I was able to trace the issue with bisect to commit
809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
fences for readable objects". The lockups appear to have cleared up
since reverting that and a few related followup commits:

809e9447: "drm/nouveau: use shared fences for readable objects"
055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
e3be4c23: "drm/nouveau: specify if interruptible wait is desired in
nouveau_fence_sync"
15a996bb: "drm/nouveau: assign fence_chan->name correctly"
Weird. I'm not sure yet what causes it.

http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2

On the EDITED patch from fixed-fences-for-bisect, can you do the following:

In nouveau/nv84_fence.c function nv84_fence_context_new, remove

fctx->base.sequence = nv84_fence_read(chan);

and add back

nouveau_bo_wr32(priv->bo, chan->chid * 16/4, 0x00000000);

If that fails you should compile your kernel with trace events, to get some debugging info from the fences. I'll post debugging info if this does not fix it.

~Maarten

Hey,
as mentioned in IRC the new fencing hangs my GPU for a while as well (nve7).
Bisected back to 86be4f216bbb9ea3339843a5658d4c21162c7ee2
, EDITED

from the fixed-fences-for-bisect branch mentioned above.

Original bisect on linus brach brought me to:
29ba89b2371d466ca68973525816cf10debc2655
drm/nouveau: rework to new fence interface

Michael if you are going to bisect the "fixed-fences-for-bisect" branch, maybe take a closer look if you come anywhere near that commit, if that does or does not trigger the GPU hangs for you!

Tobias
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/