Re: radeon 0000:1d:00.0: GPU lockup (current fence id 0x00000000010da43f last fence id 0x00000000010da52d on ring 0)
From: Huang Rui
Date: Wed Jun 06 2018 - 04:03:43 EST
On Tue, Jun 05, 2018 at 04:44:04PM +0200, Borislav Petkov wrote:
> Hi guys,
>
> X just froze here ontop of 4.17-rc7+ tip/master (kernel is from last
> week) with the splat at the end.
>
> Box is a x470 chipset with Ryzen 2700X.
>
> GPU gets detected as
>
> [ 7.440971] [drm] radeon kernel modesetting enabled.
> [ 7.441220] [drm] initializing kernel modesetting (RV635 0x1002:0x9598 0x1043:0x01DA 0x00).
> [ 7.441328] ATOM BIOS: 9598.10.88.0.3.AS05
> [ 7.441395] radeon 0000:1d:00.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used)
> [ 7.441464] radeon 0000:1d:00.0: GTT: 512M 0x0000000020000000 - 0x000000003FFFFFFF
> [ 7.441531] [drm] Detected VRAM RAM=512M, BAR=256M
> [ 7.441588] [drm] RAM width 128bits DDR
> [ 7.441690] [TTM] Zone kernel: Available graphics memory: 16462214 kiB
> [ 7.441751] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
> [ 7.441811] [TTM] Initializing pool allocator
> [ 7.441868] [TTM] Initializing DMA pool allocator
> [ 7.441934] [drm] radeon: 512M of VRAM memory ready
> [ 7.441990] [drm] radeon: 512M of GTT memory ready.
> [ 7.442050] [drm] Loading RV635 Microcode
> [ 7.442865] [drm] Internal thermal controller without fan control
> [ 7.442940] [drm] radeon: power management initialized
> [ 7.443222] [drm] GART: num cpu pages 131072, num gpu pages 131072
> [ 7.443487] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
> [ 7.477319] [drm] PCIE GART of 512M enabled (table at 0x0000000000142000).
> [ 7.477400] radeon 0000:1d:00.0: WB enabled
> [ 7.477455] radeon 0000:1d:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0x (ptrval)
> [ 7.477708] radeon 0000:1d:00.0: fence driver on ring 5 use gpu addr 0x00000000000521d0 and cpu addr 0x (ptrval)
> [ 7.477778] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [ 7.477836] [drm] Driver supports precise vblank timestamp query.
> [ 7.477896] radeon 0000:1d:00.0: radeon: MSI limited to 32-bit
> [ 7.477990] radeon 0000:1d:00.0: radeon: using MSI.
> [ 7.478062] [drm] radeon: irq initialized.
> [ 7.509056] [drm] ring test on 0 succeeded in 0 usecs
> [ 7.683793] [drm] ring test on 5 succeeded in 1 usecs
> [ 7.683853] [drm] UVD initialized successfully.
> [ 7.684009] [drm] ib test on ring 0 succeeded in 0 usecs
> [ 8.348466] [drm] ib test on ring 5 succeeded
> [ 8.348921] [drm] Radeon Display Connectors
> [ 8.348978] [drm] Connector 0:
> [ 8.349031] [drm] DVI-I-1
> [ 8.349082] [drm] HPD1
> [ 8.349135] [drm] DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
> [ 8.349200] [drm] Encoders:
> [ 8.349252] [drm] DFP1: INTERNAL_UNIPHY
> [ 8.349308] [drm] CRT2: INTERNAL_KLDSCP_DAC2
> [ 8.349364] [drm] Connector 1:
> [ 8.349416] [drm] DIN-1
> [ 8.349467] [drm] Encoders:
> [ 8.349520] [drm] TV1: INTERNAL_KLDSCP_DAC2
> [ 8.349576] [drm] Connector 2:
> [ 8.349628] [drm] DVI-I-2
> [ 8.349680] [drm] HPD2
> [ 8.349732] [drm] DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
> [ 8.349797] [drm] Encoders:
> [ 8.349849] [drm] CRT1: INTERNAL_KLDSCP_DAC1
> [ 8.349905] [drm] DFP2: INTERNAL_KLDSCP_LVTMA
> [ 8.430521] [drm] fb mappable at 0xE0243000
> [ 8.430575] [drm] vram apper at 0xE0000000
> [ 8.431194] [drm] size 9216000
> [ 8.431245] [drm] fb depth is 24
> [ 8.431295] [drm] pitch is 7680
> [ 8.431406] fbcon: radeondrmfb (fb0) is primary device
> [ 8.496928] Console: switching to colour frame buffer device 240x75
> [ 8.501851] radeon 0000:1d:00.0: fb0: radeondrmfb frame buffer device
> [ 8.520179] [drm] Initialized radeon 2.50.0 20080528 for 0000:1d:00.0 on minor 0
>
> in the PCIe slot with two monitors connected to it. radeon firmware is
>
> Version: 20170823-1
>
> What practically happened is X froze and got restarted after the GPU
> reset. It seems to be ok now, as I'm typing in it.
>
> Thoughts?
>
> [197439.022249] Restarting tasks ... done.
> [197439.024043] PM: hibernation exit
> [197439.058296] r8169 0000:18:00.0 eth0: link up
> [200941.240184] perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
> [221973.686894] radeon 0000:1d:00.0: ring 0 stalled for more than 10176msec
> [221973.686900] radeon 0000:1d:00.0: GPU lockup (current fence id 0x00000000010da43f last fence id 0x00000000010da52d on ring 0)
> [221973.686929] radeon 0000:1d:00.0: failed to get a new IB (-35)
> [221973.686950] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
> [221973.693971] radeon 0000:1d:00.0: Saved 7609 dwords of commands on ring 0.
> [221973.693985] radeon 0000:1d:00.0: GPU softreset: 0x00000008
> [221973.693988] radeon 0000:1d:00.0: R_008010_GRBM_STATUS = 0xA0001030
> [221973.693990] radeon 0000:1d:00.0: R_008014_GRBM_STATUS2 = 0x00000003
> [221973.693992] radeon 0000:1d:00.0: R_000E50_SRBM_STATUS = 0x200010C0
> [221973.693994] radeon 0000:1d:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
> [221973.693996] radeon 0000:1d:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
> [221973.693998] radeon 0000:1d:00.0: R_00867C_CP_BUSY_STAT = 0x00000006
> [221973.694000] radeon 0000:1d:00.0: R_008680_CP_STAT = 0x80000645
> [221973.694002] radeon 0000:1d:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
> [221973.768483] radeon 0000:1d:00.0: R_008020_GRBM_SOFT_RESET=0x00004001
> [221973.768541] radeon 0000:1d:00.0: SRBM_SOFT_RESET=0x00000100
> [221973.770637] radeon 0000:1d:00.0: R_008010_GRBM_STATUS = 0xA0003030
> [221973.770643] radeon 0000:1d:00.0: R_008014_GRBM_STATUS2 = 0x00000003
> [221973.770646] radeon 0000:1d:00.0: R_000E50_SRBM_STATUS = 0x200080C0
> [221973.770648] radeon 0000:1d:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
> [221973.770650] radeon 0000:1d:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
> [221973.770652] radeon 0000:1d:00.0: R_00867C_CP_BUSY_STAT = 0x00000000
> [221973.770654] radeon 0000:1d:00.0: R_008680_CP_STAT = 0x80100000
> [221973.770656] radeon 0000:1d:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
> [221973.770664] radeon 0000:1d:00.0: GPU reset succeeded, trying to resume
> [221973.786437] [drm] PCIE gen 2 link speeds already enabled
> [221973.788725] [drm] PCIE GART of 512M enabled (table at 0x0000000000142000).
> [221973.788745] radeon 0000:1d:00.0: WB enabled
> [221973.788749] radeon 0000:1d:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0x0000000063adc4ad
> [221973.788936] radeon 0000:1d:00.0: fence driver on ring 5 use gpu addr 0x00000000000521d0 and cpu addr 0x0000000088b51197
> [221973.819814] [drm] ring test on 0 succeeded in 0 usecs
> [221973.994512] [drm] ring test on 5 succeeded in 1 usecs
> [221973.994522] [drm] UVD initialized successfully.
> [221984.438892] radeon 0000:1d:00.0: ring 0 stalled for more than 10448msec
> [221984.438898] radeon 0000:1d:00.0: GPU lockup (current fence id 0x00000000010da440 last fence id 0x00000000010da52d on ring 0)
> [221984.450978] [drm:r600_ib_test [radeon]] *ERROR* radeon: fence wait failed (-35).
> [221984.451011] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on GFX ring (-35).
>
Ring 0 ring test passed, but fence is not back from ib test. Is it possible
that page table is corrupted after gpu reset? Radeon is legacy driver,
Christian, can you comment it?
Thanks,
Ray