Re: [git pull] drm fixes for 6.1-rc1
From: Dave Airlie
Date: Sun Oct 16 2022 - 17:44:51 EST
On Sun, 16 Oct 2022 at 18:09, Arthur Marsh
<arthur.marsh@xxxxxxxxxxxxxxxx> wrote:
>
> From: Arthur Marsh <arthur.marsh@xxxxxxxxxxxxxxxx>
>
> Hi, the "drm fixes for 6.1-rc1" commit caused the amdgpu module to fail
> with my Cape Verde radeonsi card.
>
> I haven't been able to bisect the problem to an individual commit, but
> attach a dmesg extract below.
>
> I'm happy to supply any other configuration information and test patches.
>
Can you try reverting: it's the only think I can spot that might
affect a card that old since most changes in that request were for
display hw you don't have.
ommit 312b4dc11d4f74bfe03ea25ffe04c1f2fdd13cb9
Author: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@xxxxxxx>
Date: Tue Oct 4 07:33:39 2022 -0700
drm/amdgpu: Fix VRAM BO swap issue
DRM buddy manager allocates the contiguous memory requests in
a single block or multiple blocks. So for the ttm move operation
(incase of low vram memory) we should consider all the blocks to
compute the total memory size which compared with the struct
ttm_resource num_pages in order to verify that the blocks are
contiguous for the eviction process.
v2: Added a Fixes tag
v3: Rewrite the code to save a bit of calculations and
variables (Christian)
Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@xxxxxxx>
Reviewed-by: Christian König <christian.koenig@xxxxxxx>
Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
Thanks,
Dave.
> Arthur.
>
> Linux version 6.0.0+ (root@am64) (gcc-12 (Debian 12.2.0-5) 12.2.0, GNU ld (GNU Binutils for Debian) 2.39) #5179 SMP PREEMPT_DYNAMIC Fri Oct 14 17:00:40 ACDT 2022
> Command line: BOOT_IMAGE=/vmlinuz-6.0.0+ root=UUID=39706f53-7c27-4310-b22a-36c7b042d1a1 ro single amdgpu.audio=1 amdgpu.si_support=1 radeon.si_support=0 page_owner=on amdgpu.gpu_recovery=1
> ...
>
> [drm] amdgpu kernel modesetting enabled.
> amdgpu 0000:01:00.0: vgaarb: deactivate vga console
> Console: switching to colour dummy device 80x25
> [drm] initializing kernel modesetting (VERDE 0x1002:0x682B 0x1458:0x22CA 0x87).
> [drm] register mmio base: 0xFE8C0000
> [drm] register mmio size: 262144
> [drm] add ip block number 0 <si_common>
> [drm] add ip block number 1 <gmc_v6_0>
> [drm] add ip block number 2 <si_ih>
> [drm] add ip block number 3 <gfx_v6_0>
> [drm] add ip block number 4 <si_dma>
> [drm] add ip block number 5 <si_dpm>
> [drm] add ip block number 6 <dce_v6_0>
> [drm] add ip block number 7 <uvd_v3_1>
> [drm] BIOS signature incorrect 5b 7
> resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000dffff window]
> caller pci_map_rom+0x68/0x1b0 mapping multiple BARs
> amdgpu 0000:01:00.0: No more image in the PCI ROM
> amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
> amdgpu: ATOM BIOS: xxx-xxx-xxx
> amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
> amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
> [drm] PCIE gen 2 link speeds already enabled
> [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
> RTL8211B Gigabit Ethernet r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
> r8169 0000:03:00.0 eth0: Link is Down
> amdgpu 0000:01:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
> amdgpu 0000:01:00.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
> [drm] Detected VRAM RAM=2048M, BAR=256M
> [drm] RAM width 128bits DDR3
> [drm] amdgpu: 2048M of VRAM memory ready
> [drm] amdgpu: 3979M of GTT memory ready.
> [drm] GART: num cpu pages 262144, num gpu pages 262144
> amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F400A00000).
> [drm] Internal thermal controller with fan control
> [drm] amdgpu: dpm initialized
> [drm] AMDGPU Display Connectors
> [drm] Connector 0:
> [drm] HDMI-A-1
> [drm] HPD1
> [drm] DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f
> [drm] Encoders:
> [drm] DFP1: INTERNAL_UNIPHY
> [drm] Connector 1:
> [drm] DVI-D-1
> [drm] HPD2
> [drm] DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
> [drm] Encoders:
> [drm] DFP2: INTERNAL_UNIPHY
> [drm] Connector 2:
> [drm] VGA-1
> [drm] DDC: 0x1970 0x1970 0x1971 0x1971 0x1972 0x1972 0x1973 0x1973
> [drm] Encoders:
> [drm] CRT1: INTERNAL_KLDSCP_DAC1
> [drm] Found UVD firmware Version: 64.0 Family ID: 13
> amdgpu: Move buffer fallback to memcpy unavailable
> [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP block <uvd_v3_1> failed -19
> amdgpu 0000:01:00.0: amdgpu: amdgpu_device_ip_init failed
> amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
> amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
> BUG: kernel NULL pointer dereference, address: 0000000000000090
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 0 P4D 0
> Oops: 0002 [#1] PREEMPT SMP NOPTI
> CPU: 3 PID: 447 Comm: udevd Not tainted 6.0.0+ #5179
> Hardware name: System manufacturer System Product Name/M3A78 PRO, BIOS 1701 01/27/2011
> RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
> Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
> RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
> RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
> RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
> RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
> R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
> R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
> FS: 00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
> Call Trace:
> <TASK>
> amdgpu_fence_driver_sw_fini+0xc2/0xd0 [amdgpu]
> amdgpu_device_fini_sw+0x17/0x3c0 [amdgpu]
> amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
> devm_drm_dev_init_release+0x4a/0x70 [drm]
> release_nodes+0x40/0xb0
> devres_release_all+0x89/0xc0
> device_unbind_cleanup+0xe/0x70
> really_probe+0x245/0x3a0
> ? pm_runtime_barrier+0x61/0xb0
> __driver_probe_device+0x78/0x170
> driver_probe_device+0x2d/0xb0
> __driver_attach+0xdc/0x1d0
> ? __device_attach_driver+0x100/0x100
> bus_for_each_dev+0x69/0xa0
> bus_add_driver+0x1d4/0x230
> ? _raw_spin_unlock+0x15/0x40
> driver_register+0x89/0xe0
> ? 0xffffffffc0c3b000
> do_one_initcall+0x44/0x200
> ? __kmem_cache_alloc_node+0x90/0x360
> ? kmalloc_trace+0x38/0xc0
> do_init_module+0x4a/0x1e0
> __do_sys_finit_module+0xb5/0x130
> do_syscall_64+0x3a/0x90
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7fd81ff5b1b9
> Code: 08 44 89 e0 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 1c 0d 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffc5b37cbb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> RAX: ffffffffffffffda RBX: 000055e5f2f6a140 RCX: 00007fd81ff5b1b9
> RDX: 0000000000000000 RSI: 000055e5f2f67e30 RDI: 0000000000000017
> RBP: 000055e5f2f67e30 R08: 0000000000000000 R09: 000055e5f2f46700
> R10: 0000000000000017 R11: 0000000000000246 R12: 0000000000020000
> R13: 0000000000000000 R14: 000055e5f2f65b00 R15: 0000000000000024
> </TASK>
> Modules linked in: amdgpu(+) snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event snd_seq wmi_bmof snd_emu10k1 edac_mce_amd gpu_sched drm_buddy video kvm_amd drm_ttm_helper ttm snd_util_mem drm_display_helper snd_ac97_codec ccp drm_kms_helper snd_hda_codec_hdmi rng_core ac97_bus snd_rawmidi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_seq_device drm kvm snd_hwdep snd_pcm_oss snd_mixer_oss evdev serio_raw snd_pcm irqbypass i2c_algo_bit fb_sys_fops syscopyarea sysfillrect emu10k1_gp pcspkr gameport k10temp snd_timer sysimgblt snd acpi_cpufreq wmi soundcore button sp5100_tco asus_atk0110 ext4 crc16 mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas usb_storage sg sd_mod hid_generic t10_pi usbhid hid sr_mod cdrom crc64_rocksoft crc64 ata_generic ahci pata_atiixp libahci ohci_pci firewire_ohci libata firewire_core crc_itu_t xhci_pci scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd
> realtek ehci_hcd mdio_devres i2c_piix4 scsi_common usbcore libphy usb_common
> CR2: 0000000000000090
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
> Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
> RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
> RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
> RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
> RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
> R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
> R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
> FS: 00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
> note: udevd[447] exited with preempt_count 1
> udevd[433]: worker [447] terminated by signal 9 (Killed)
> udevd[433]: worker [447] failed while handling '/devices/pci0000:00/0000:00:02.0/0000:01:00.0'
> r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control off
> IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> Adding 4194300k swap on /dev/sda4. Priority:-2 extents:1 across:4194300k FS
> EXT4-fs (sda5): re-mounted. Quota mode: none.
> lp: driver loaded but no devices found
> ppdev: user-space parallel port driver
> it87: Found IT8716F chip at 0xe80, revision 3
> ACPI Warning: SystemIO range 0x0000000000000E85-0x0000000000000E86 conflicts with OpRegion 0x0000000000000E85-0x0000000000000E86 (\_SB.PCI0.SBRG.ASOC.HWRE) (20220331/utaddress-204)
> ACPI: OSL: Resource conflict; ACPI support missing from driver?
> BUG: unable to handle page fault for address: 00000000000065c0
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#2] PREEMPT SMP NOPTI
> CPU: 2 PID: 55 Comm: kworker/2:1 Tainted: G D 6.0.0+ #5179
> Hardware name: System manufacturer System Product Name/M3A78 PRO, BIOS 1701 01/27/2011
> Workqueue: events output_poll_execute [drm_kms_helper]
> RIP: 0010:amdgpu_device_rreg.part.0+0x39/0x100 [amdgpu]
> Code: 6c 24 08 48 89 fb 4c 89 64 24 10 44 8d 24 b5 00 00 00 00 4c 3b a7 88 08 00 00 89 f5 73 70 83 e2 02 74 2f 4c 03 a3 90 08 00 00 <45> 8b 24 24 48 8b 43 08 0f b7 70 3e 66 90 44 89 e0 48 8b 1c 24 48
> RSP: 0018:ffffbeb3c0717c48 EFLAGS: 00010206
> RAX: 0000000000000000 RBX: ffff99bae8260000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000001970 RDI: ffff99bae8260000
> RBP: 0000000000001970 R08: ffffbeb3c0717e08 R09: 0000000000000000
> R10: 0000000000000018 R11: fefefefefefefeff R12: 00000000000065c0
> R13: ffffbeb3c0717d70 R14: 0000000000000000 R15: 000000010005e340
> FS: 0000000000000000(0000) GS:ffff99bb67c80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0
> Call Trace:
> <TASK>
> amdgpu_i2c_pre_xfer+0x163/0x180 [amdgpu]
> bit_xfer+0x36/0x530 [i2c_algo_bit]
> __i2c_transfer+0x185/0x550
> i2c_transfer+0xa2/0x110
> amdgpu_display_ddc_probe+0xbd/0x100 [amdgpu]
> amdgpu_connector_vga_detect+0x8e/0x200 [amdgpu]
> drm_helper_probe_detect_ctx+0x7b/0xd0 [drm_kms_helper]
> output_poll_execute+0x152/0x220 [drm_kms_helper]
> process_one_work+0x1ae/0x370
> worker_thread+0x4d/0x3b0
> ? rescuer_thread+0x380/0x380
> kthread+0xe3/0x110
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x22/0x30
> </TASK>
> Modules linked in: max6650 hwmon_vid parport_pc ppdev lp parport amdgpu(+) snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event snd_seq wmi_bmof snd_emu10k1 edac_mce_amd gpu_sched drm_buddy video kvm_amd drm_ttm_helper ttm snd_util_mem drm_display_helper snd_ac97_codec ccp drm_kms_helper snd_hda_codec_hdmi rng_core ac97_bus snd_rawmidi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_seq_device drm kvm snd_hwdep snd_pcm_oss snd_mixer_oss evdev serio_raw snd_pcm irqbypass i2c_algo_bit fb_sys_fops syscopyarea sysfillrect emu10k1_gp pcspkr gameport k10temp snd_timer sysimgblt snd acpi_cpufreq wmi soundcore button sp5100_tco asus_atk0110 ext4 crc16 mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas usb_storage sg sd_mod hid_generic t10_pi usbhid hid sr_mod cdrom crc64_rocksoft crc64 ata_generic ahci pata_atiixp libahci ohci_pci firewire_ohci libata firewire_core crc_itu_t xhci_pci
> scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd realtek ehci_hcd mdio_devres i2c_piix4 scsi_common usbcore libphy usb_common
> CR2: 00000000000065c0
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
> Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
> RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
> RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
> RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
> RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
> R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
> R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff99bb67c80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0