Re: 5.11-rc1 TTM list corruption

From: David Woodhouse
Date: Wed Jan 06 2021 - 11:55:20 EST


On Tue, 2021-01-05 at 16:40 +0100, Christian König wrote:
> Am 05.01.21 um 13:20 schrieb Huang Rui:
> > On Tue, Jan 05, 2021 at 07:43:51PM +0800, Borislav Petkov wrote:
> > > On Tue, Jan 05, 2021 at 07:08:52PM +0800, Huang Rui wrote:
> > > > Ah, this asic is a bit old and still use radeon driver. So we didn't
> > > > reproduce it on amdgpu driver. I don't have such the old asic in my hand.
> > > > May we know whether this issue can be duplicated after SI which is used
> > > > amdgpu module (not sure whether you have recent APU or GPU)?
> > >
> > > The latest I have (I think it is the latest) is:
> > >
> > > [ 1.826102] [drm] initializing kernel modesetting (RENOIR 0x1002:0x1636 0x17AA:0x5099 0xD1).
> > >
> > > and so far that hasn't triggered it. Which makes sense because that
> > > thing uses amdgpu:
> > >
> > > [ 1.810260] [drm] amdgpu kernel modesetting enabled.
> >
> > Yes! Renoir is late enough for amdgpu kernel module. :-)
> > Please let us know if you still encounter the issue.
>
> Thanks for the hints guys. You need a rather specific configuration, but
> I can reproduce this now.
>
> Let's see what the problem is here.

FWIW I'm seeing it here on my workstation too.

[ 3.952102] [drm] radeon kernel modesetting enabled.
[ 3.952885] checking generic (90000000 300000) vs hw (90000000 10000000)
[ 3.952898] fb0: switching to radeondrmfb from EFI VGA
[ 3.953665] Console: switching to colour dummy device 80x25
[ 3.953696] radeon 0000:03:00.0: vgaarb: deactivate vga console
[ 3.953898] [drm] initializing kernel modesetting (CYPRESS 0x1002:0x6898 0x1462:0x8032 0x00).
[ 3.953940] resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c4000-0x000cbfff window]
[ 3.953945] caller pci_map_rom+0x6c/0x1b0 mapping multiple BARs
[ 3.953972] ATOM BIOS: 113
[ 3.954028] radeon 0000:03:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
[ 3.954032] radeon 0000:03:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
[ 3.954037] [drm] Detected VRAM RAM=1024M, BAR=256M
[ 3.954039] [drm] RAM width 256bits DDR
[ 3.954087] [TTM] Zone kernel: Available graphics memory: 16389788 KiB
[ 3.954090] [TTM] Zone dma32: Available graphics memory: 2097152 KiB
[ 3.954105] [drm] radeon: 1024M of VRAM memory ready
[ 3.954107] [drm] radeon: 1024M of GTT memory ready.
[ 3.954114] [drm] Loading CYPRESS Microcode
[ 3.954168] [drm] Internal thermal controller with fan control
[ 3.954531] usb 3-1.1.1: New USB device found, idVendor=10d5, idProduct=1234, bcdDevice= 9.02
[ 3.954539] usb 3-1.1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 3.958098] hub 3-1.1.1:1.0: USB hub found
[ 3.959704] hub 3-1.1.1:1.0: 4 ports detected
[ 3.975098] [drm] radeon: dpm initialized
[ 3.975159] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 3.976074] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[ 3.979669] igb 0000:01:00.0 eno0: renamed from eth0
[ 3.993789] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000).
[ 3.993912] radeon 0000:03:00.0: WB enabled
[ 3.993915] radeon 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00
[ 3.993918] radeon 0000:03:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c
[ 3.994359] radeon 0000:03:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418
[ 3.994531] radeon 0000:03:00.0: radeon: MSI limited to 32-bit
[ 3.994563] radeon 0000:03:00.0: radeon: using MSI.
[ 3.994581] [drm] radeon: irq initialized.
[ 4.011086] [drm] ring test on 0 succeeded in 1 usecs
[ 4.011094] [drm] ring test on 3 succeeded in 2 usecs
[ 4.030666] EXT4-fs (md127): mounted filesystem with ordered data mode. Opts: (null)
[ 4.188159] [drm] ring test on 5 succeeded in 1 usecs
[ 4.188165] [drm] UVD initialized successfully.
[ 4.188326] [drm] ib test on ring 0 succeeded in 0 usecs
[ 4.188371] [drm] ib test on ring 3 succeeded in 0 usecs
...
[ 4.839982] [drm] ib test on ring 5 succeeded
[ 4.841079] [drm] Radeon Display Connectors
[ 4.841087] [drm] Connector 0:
[ 4.841090] [drm] DP-1
[ 4.841094] [drm] HPD4
[ 4.841097] [drm] DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c
[ 4.841104] [drm] Encoders:
[ 4.841107] [drm] DFP1: INTERNAL_UNIPHY2
[ 4.841111] [drm] Connector 1:
[ 4.841114] [drm] HDMI-A-1
[ 4.841118] [drm] HPD5
[ 4.841120] [drm] DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c
[ 4.841127] [drm] Encoders:
[ 4.841130] [drm] DFP2: INTERNAL_UNIPHY2
[ 4.841133] [drm] Connector 2:
[ 4.841136] [drm] DVI-I-1
[ 4.841139] [drm] HPD1
[ 4.841142] [drm] DDC: 0x6450 0x6450 0x6454 0x6454 0x6458 0x6458 0x645c 0x645c
[ 4.841149] [drm] Encoders:
[ 4.841151] [drm] DFP3: INTERNAL_UNIPHY1
[ 4.841155] [drm] CRT2: INTERNAL_KLDSCP_DAC2
[ 4.841159] [drm] Connector 3:
[ 4.841162] [drm] DVI-I-2
[ 4.841165] [drm] HPD6
[ 4.841168] [drm] DDC: 0x6470 0x6470 0x6474 0x6474 0x6478 0x6478 0x647c 0x647c
[ 4.841174] [drm] Encoders:
[ 4.841177] [drm] DFP4: INTERNAL_UNIPHY
[ 4.841180] [drm] CRT1: INTERNAL_KLDSCP_DAC1
[ 4.921539] [drm] fb mappable at 0x9034D000
[ 4.921547] [drm] vram apper at 0x90000000
[ 4.921549] [drm] size 9216000
[ 4.921552] [drm] fb depth is 24
[ 4.921555] [drm] pitch is 7680
[ 4.921680] fbcon: radeondrmfb (fb0) is primary device
[ 4.943121] Console: switching to colour frame buffer device 240x75
[ 4.950509] radeon 0000:03:00.0: [drm] fb0: radeondrmfb frame buffer device
[ 4.959011] [drm] Initialized radeon 2.50.0 20080528 for 0000:03:00.0 on minor 0


...

[27221.673320] list_del corruption. next->prev should be ffffffffc02e4e40, but was ffff98de96e40ed0
[27221.673355] ------------[ cut here ]------------
[27221.673357] kernel BUG at lib/list_debug.c:54!
[27221.673365] invalid opcode: 0000 [#1] SMP PTI
[27221.673370] CPU: 9 PID: 263 Comm: kswapd0 Tainted: G S I 5.10.0+ #701
[27221.673373] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
[27221.673376] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[27221.673386] Code: c7 c7 08 b7 40 9d e8 77 3f fe ff 0f 0b 48 89 fe 48 c7 c7 98 b7 40 9d e8 66 3f fe ff 0f 0b 48 c7 c7 48 b8 40 9d e8 58 3f fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 08 b8 40 9d e8 44 3f fe ff 0f 0b
[27221.673389] RSP: 0000:ffffac17007f3c20 EFLAGS: 00010286
[27221.673394] RAX: 0000000000000054 RBX: ffffffffc02e4e40 RCX: 0000000000000000
[27221.673396] RDX: ffff98e5df866ba0 RSI: ffff98e5df858ac0 RDI: ffff98e5df858ac0
[27221.673398] RBP: 0000000000000080 R08: 0000000000000000 R09: ffffac17007f3a58
[27221.673401] R10: ffffac17007f3a50 R11: ffffffff9d744ca8 R12: 0000000000000000
[27221.673403] R13: 0000000000000000 R14: 0000000000000084 R15: ffffffffc02e4ba0
[27221.673405] FS: 0000000000000000(0000) GS:ffff98e5df840000(0000) knlGS:0000000000000000
[27221.673408] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[27221.673411] CR2: 00000000004fea86 CR3: 000000079a9e4001 CR4: 00000000001726e0
[27221.673414] Call Trace:
[27221.673420] ttm_pool_shrink+0x53/0xb0 [ttm]
[27221.673433] ttm_pool_shrinker_scan+0xa/0x20 [ttm]
[27221.673440] do_shrink_slab+0x145/0x240
[27221.673447] shrink_slab+0x9c/0x280
[27221.673451] shrink_node+0x2c2/0x6f0
[27221.673456] balance_pgdat+0x2ff/0x620
[27221.673461] kswapd+0x1e6/0x360
[27221.673464] ? finish_wait+0x80/0x80
[27221.673471] ? balance_pgdat+0x620/0x620
[27221.673474] kthread+0x11b/0x140
[27221.673479] ? __kthread_bind_mask+0x60/0x60
[27221.673483] ret_from_fork+0x22/0x30
[27221.673491] Modules linked in: vhost_net vhost vhost_iotlb tap xt_MASQUERADE xt_conntrack xt_CHECKSUM ip6t_REJECT ipt_REJECT nf_nat_tftp nft_objref nf_conntrack_tftp nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security tun bridge iptable_nat nf_nat nf_conntrack stp llc nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vfat fat intel_rapl_msr intel_rapl_common sb_edac snd_hda_codec_realtek x86_pkg_temp_thermal snd_hda_codec_generic intel_powerclamp ledtrig_audio snd_hda_codec_hdmi coretemp kvm_intel snd_hda_intel joydev snd_intel_dspcfg apple_mfi_fastcharge snd_hda_codec kvm snd_hda_core iTCO_wdt irqbypass intel_pmc_bxt snd_hwdep iTCO_vendor_support snd_seq ipmi_si rapl snd_seq_device ipmi_devintf intel_cstate
[27221.673569] snd_pcm mei_me intel_uncore i2c_i801 ipmi_msghandler pcspkr snd_timer i2c_smbus mei snd lpc_ich ioatdma soundcore acpi_power_meter acpi_pad auth_rpcgss binfmt_misc sunrpc ip_tables radeon uas usb_storage drm_ttm_helper ttm drm_kms_helper igb cec crct10dif_pclmul crc32_pclmul crc32c_intel dca drm raid0 ghash_clmulni_intel wmi i2c_algo_bit fuse ecryptfs
[27221.673609] ---[ end trace 98f04a1b0e5570b4 ]---
[27221.726254] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[27221.726277] Code: c7 c7 08 b7 40 9d e8 77 3f fe ff 0f 0b 48 89 fe 48 c7 c7 98 b7 40 9d e8 66 3f fe ff 0f 0b 48 c7 c7 48 b8 40 9d e8 58 3f fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 08 b8 40 9d e8 44 3f fe ff 0f 0b
[27221.726281] RSP: 0000:ffffac17007f3c20 EFLAGS: 00010286
[27221.726284] RAX: 0000000000000054 RBX: ffffffffc02e4e40 RCX: 0000000000000000
[27221.726286] RDX: ffff98e5df866ba0 RSI: ffff98e5df858ac0 RDI: ffff98e5df858ac0
[27221.726288] RBP: 0000000000000080 R08: 0000000000000000 R09: ffffac17007f3a58
[27221.726290] R10: ffffac17007f3a50 R11: ffffffff9d744ca8 R12: 0000000000000000
[27221.726292] R13: 0000000000000000 R14: 0000000000000084 R15: ffffffffc02e4ba0
[27221.726294] FS: 0000000000000000(0000) GS:ffff98e5df840000(0000) knlGS:0000000000000000
[27221.726296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[27221.726298] CR2: 00000000004fea86 CR3: 000000079a9e4001 CR4: 00000000001726e0

Attachment: smime.p7s
Description: S/MIME cryptographic signature