next-20230726 and later - crash in radeon module during init
From: Valdis Klētnieks
Date: Thu Aug 10 2023 - 05:35:15 EST
I am seeing the following consistent crash at boot:
[ 61.211213][ T819] [drm] radeon kernel modesetting enabled.
[ 61.584870][ T819] vga_switcheroo: detected switching method \_SB_.PCI0.GFX0.ATPX handle
[ 61.667507][ T819] ATPX version 1, functions 0x00000033
[ 61.748228][ T819] general protection fault, probably for non-canonical address 0x54080068930549a0: 0000 [#1] PREEMPT SMP
[ 61.829840][ T819] CPU: 3 PID: 819 Comm: (udev-worker) Tainted: G I T 6.5.0-rc4-next-20230804 #58 5cce04b101a5bb4a6c0368bfff037f6f096b3d3e
[ 61.911411][ T819] Hardware name: Dell Inc. Inspiron 5559/052K07, BIOS 1.9.0 09/07/2020
[ 61.993285][ T819] RIP: 0010:strnlen+0x21/0x40
[ 62.074885][ T819] Code: 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 55 48 89 e5 48 8d 14 37 31 c0 48 85 f6 74 16 48 89 f8 eb 09 48 83 c0 01 48 39 c2 74 0e <80> 38 00 75 f2 48 29 f8 5d c3
cc cc cc cc 48 89 d0 5d 48 29 f8 c3
[ 62.156529][ T819] RSP: 0018:ffffa310419979b8 EFLAGS: 00010202
[ 62.318407][ T819] RAX: 54080068930549a0 RBX: ffffa31041997a20 RCX: 0000000000000000
[ 62.400015][ T819] RDX: 54080068930549b0 RSI: 0000000000000010 RDI: 54080068930549a0
[ 62.481624][ T819] RBP: ffffa310419979b8 R08: ffff937b85579990 R09: ffffa31041997ad8
[ 62.563644][ T819] R10: ffff937b86ddae00 R11: 0000000000000000 R12: 54080068930549a0
[ 62.645194][ T819] R13: ffff937b814291b8 R14: 0000000000000001 R15: ffffa31041997b81
[ 62.726753][ T819] FS: 00007efd50479600(0000) GS:ffff937ef2e00000(0000) knlGS:0000000000000000
[ 62.808312][ T819] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 62.889830][ T819] CR2: 00007f125d30ee70 CR3: 0000000105644002 CR4: 00000000003706e0
[ 62.971390][ T819] Call Trace:
[ 63.052954][ T819] <TASK>
[ 63.134501][ T819] ? show_regs+0x64/0x70
[ 63.216058][ T819] ? die_addr+0x36/0x90
[ 63.297594][ T819] ? exc_general_protection+0x1c1/0x440
[ 63.379112][ T819] ? asm_exc_general_protection+0x2b/0x30
[ 63.460650][ T819] ? strnlen+0x21/0x40
[ 63.542209][ T819] set_dev_info+0x40/0x170
[ 63.623762][ T819] dev_printk_emit+0xa8/0xe0
[ 63.705308][ T819] __dev_printk+0x34/0x80
[ 63.786806][ T819] _dev_info+0x7a/0xa0
[ 63.868304][ T819] radeon_atpx_validate.constprop.0.isra.0+0xbc/0x100 [radeon f030e9a708043a486415a94978106b28cd7cb9a2]
[ 63.949952][ T819] radeon_atpx_detect+0x17b/0x190 [radeon f030e9a708043a486415a94978106b28cd7cb9a2]
[ 64.031547][ T819] ? __pfx_radeon_module_init+0x10/0x10 [radeon f030e9a708043a486415a94978106b28cd7cb9a2]
[ 64.113102][ T819] radeon_register_atpx_handler+0xd/0x30 [radeon f030e9a708043a486415a94978106b28cd7cb9a2]
[ 64.194721][ T819] radeon_module_init+0x84/0xff0 [radeon f030e9a708043a486415a94978106b28cd7cb9a2]
[ 64.276365][ T819] do_one_initcall+0x86/0x380
[ 64.357865][ T819] do_init_module+0x63/0x220
[ 64.439342][ T819] load_module+0x99d/0xa90
Some quick digging indicates the most likely culprit is:
commit cbd0606e6a776bf2ba10d4a6957bb7628c0da947
Author: Srinivasan Shanmugam <srinivasan.shanmugam@xxxxxxx>
Date: Thu Jul 20 15:39:24 2023 +0530
drm/radeon: Prefer dev_* variant over printk
Changed from pr_err/info to dev_* variants so that
we get better debug info when there are multiple GPUs
in the system.
Looks like this is the failure point because 'dev' is trashed:
+ dev_info(dev, "ATPX Hybrid Graphics\n");
But I admit I don't know the APCI stuff well enough to see what, if
anything, is wrong with this:
+ struct acpi_device *adev = container_of(atpx->handle, struct acpi_device, handle);
+ struct device *dev = &adev->dev;
Any ideas?
Attachment:
pgpPL6wBn8OjF.pgp
Description: PGP signature