Re: [REGRESSION] soft lockup on boot starting with kernel 6.10 / commit 5186ba33234c9a90833f7c93ce7de80e25fac6f5

From: Borislav Petkov
Date: Mon Sep 09 2024 - 04:06:49 EST


On Sun, Sep 08, 2024 at 11:53:56PM -0700, Hugues Bruant wrote:
> Hi,
>
> I have discovered a 100% reliable soft lockup on boot on my laptop:
> Purism Librem 14, Intel Core i7-10710U, 48Gb RAM, Samsung Evo Plus 970
> SSD, CoreBoot BIOS, grub bootloader, Arch Linux.
>
> The last working release is kernel 6.9.10, every release from 6.10
> onwards reliably exhibit the issue, which, based on journalctl logs,
> seems to be triggered somewhere in systemd-udev:
> https://gitlab.archlinux.org/-/project/42594/uploads/04583baf22189a0a8bb2f8773096e013/lockup.log
>
> Bisect points to commit 5186ba33234c9a90833f7c93ce7de80e25fac6f5

That's a merge commit. Meaning, the bisection likely went into the wrong
direction.

Looking at your log, the first warn is in framebuffer_coreboot. Some mess in
the sysfs platform devices registration.

Adding the relevant people for that:

Aug 20 20:29:36 luna kernel: sysfs: cannot create duplicate filename '/bus/platform/devices/simple-framebuffer.0'
Aug 20 20:29:36 luna kernel: CPU: 5 PID: 571 Comm: (udev-worker) Tainted: G OE 6.10.6-arch1-1 #1 703d152c24f1971e36f16e505405e456fc9e23f8
Aug 20 20:29:36 luna kernel: Hardware name: Purism Librem 14/Librem 14, BIOS 4.14-Purism-1 06/18/2021
Aug 20 20:29:36 luna kernel: Call Trace:
Aug 20 20:29:36 luna kernel: <TASK>
Aug 20 20:29:36 luna kernel: dump_stack_lvl+0x5d/0x80
Aug 20 20:29:36 luna kernel: sysfs_warn_dup.cold+0x17/0x23
Aug 20 20:29:36 luna kernel: sysfs_do_create_link_sd+0xcf/0xe0
Aug 20 20:29:36 luna kernel: bus_add_device+0x6b/0x130
Aug 20 20:29:36 luna kernel: device_add+0x3b3/0x870
Aug 20 20:29:36 luna kernel: platform_device_add+0xed/0x250
Aug 20 20:29:36 luna kernel: platform_device_register_full+0xbb/0x140
Aug 20 20:29:36 luna kernel: platform_device_register_resndata.constprop.0+0x54/0x80 [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0]
Aug 20 20:29:36 luna kernel: framebuffer_probe+0x165/0x1b0 [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0]
Aug 20 20:29:36 luna kernel: really_probe+0xdb/0x340
Aug 20 20:29:36 luna kernel: ? pm_runtime_barrier+0x54/0x90
Aug 20 20:29:36 luna kernel: ? __pfx___driver_attach+0x10/0x10
Aug 20 20:29:36 luna kernel: __driver_probe_device+0x78/0x110
Aug 20 20:29:36 luna kernel: driver_probe_device+0x1f/0xa0
Aug 20 20:29:36 luna kernel: __driver_attach+0xba/0x1c0
Aug 20 20:29:36 luna kernel: bus_for_each_dev+0x8c/0xe0
Aug 20 20:29:36 luna kernel: bus_add_driver+0x112/0x1f0
Aug 20 20:29:36 luna kernel: driver_register+0x72/0xd0
Aug 20 20:29:36 luna kernel: ? __pfx_framebuffer_driver_init+0x10/0x10 [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0]
Aug 20 20:29:36 luna kernel: do_one_initcall+0x58/0x310
Aug 20 20:29:36 luna kernel: do_init_module+0x60/0x220
Aug 20 20:29:36 luna kernel: init_module_from_file+0x89/0xe0
Aug 20 20:29:36 luna kernel: idempotent_init_module+0x121/0x320
Aug 20 20:29:36 luna kernel: __x64_sys_finit_module+0x5e/0xb0
Aug 20 20:29:36 luna kernel: do_syscall_64+0x82/0x190
Aug 20 20:29:36 luna kernel: ? __do_sys_newfstatat+0x3c/0x80
Aug 20 20:29:36 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200
Aug 20 20:29:36 luna kernel: ? do_syscall_64+0x8e/0x190
Aug 20 20:29:36 luna kernel: ? do_sys_openat2+0x9c/0xe0
Aug 20 20:29:36 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200
Aug 20 20:29:36 luna kernel: ? do_syscall_64+0x8e/0x190
Aug 20 20:29:36 luna kernel: ? clear_bhb_loop+0x25/0x80
Aug 20 20:29:36 luna kernel: ? clear_bhb_loop+0x25/0x80
Aug 20 20:29:36 luna kernel: ? clear_bhb_loop+0x25/0x80
Aug 20 20:29:36 luna kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 20 20:29:36 luna kernel: RIP: 0033:0x7b1bee2f81fd

The real issue is in i915 however.

However, you have out-of-tree modules. Try reproducing it without them.

Adding i915 people too.

Aug 20 20:29:37 luna kernel: resource: Trying to free nonexistent resource <0x00000000a0000000-0x00000000a0257fff>
Aug 20 20:29:37 luna kernel: BUG: unable to handle page fault for address: 0000000300000031
Aug 20 20:29:37 luna kernel: #PF: supervisor read access in kernel mode
Aug 20 20:29:37 luna kernel: #PF: error_code(0x0000) - not-present page
Aug 20 20:29:37 luna kernel: PGD 0 P4D 0
Aug 20 20:29:37 luna kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
Aug 20 20:29:37 luna kernel: CPU: 9 PID: 552 Comm: (udev-worker) Tainted: G OE 6.10.6-arch1-1 #1 703d152c24f1971e36f16e505405e456fc9e23f8
Aug 20 20:29:37 luna kernel: Hardware name: Purism Librem 14/Librem 14, BIOS 4.14-Purism-1 06/18/2021
Aug 20 20:29:37 luna kernel: RIP: 0010:__release_resource+0x34/0xb0
Aug 20 20:29:37 luna kernel: Code: 8d 50 38 48 8b 40 38 48 85 c0 75 27 eb 6a 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 50 30 <48> 8b 40 30 48 85 c0 74 45 48 39 c7 75 ee 40 84 f6 75 45 48 8b 4f
Aug 20 20:29:37 luna kernel: RSP: 0018:ffffb30dc207f930 EFLAGS: 00010296
Aug 20 20:29:37 luna kernel: RAX: 0000000300000001 RBX: ffff8fa34616e900 RCX: ffff8fa3424aac50
Aug 20 20:29:37 luna kernel: RDX: 0000000300000031 RSI: 0000000000000001 RDI: ffff8fa34616e900
Aug 20 20:29:37 luna kernel: RBP: ffff8fa3460e1400 R08: ffff8fa3424a97b8 R09: 0000000000000000
Aug 20 20:29:37 luna kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fa341671000
Aug 20 20:29:37 luna kernel: R13: 0000000000000000 R14: ffff8fa3416710c8 R15: ffff8fa341671000
Aug 20 20:29:37 luna kernel: FS: 00007b1bee0eb880(0000) GS:ffff8fae6e480000(0000) knlGS:0000000000000000
Aug 20 20:29:37 luna kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 20 20:29:37 luna kernel: CR2: 0000000300000031 CR3: 0000000103924002 CR4: 00000000003706f0
Aug 20 20:29:37 luna kernel: Call Trace:
Aug 20 20:29:37 luna kernel: <TASK>
Aug 20 20:29:37 luna kernel: ? __die_body.cold+0x19/0x27
Aug 20 20:29:37 luna kernel: ? page_fault_oops+0x15a/0x2d0
Aug 20 20:29:37 luna kernel: ? exc_page_fault+0x81/0x190
Aug 20 20:29:37 luna kernel: ? asm_exc_page_fault+0x26/0x30
Aug 20 20:29:37 luna kernel: ? __release_resource+0x34/0xb0
Aug 20 20:29:37 luna kernel: release_resource+0x26/0x40
Aug 20 20:29:37 luna kernel: platform_device_del+0x51/0x90
Aug 20 20:29:37 luna kernel: platform_device_unregister+0x12/0x30
Aug 20 20:29:37 luna kernel: sysfb_disable+0x2f/0x80
Aug 20 20:29:37 luna kernel: aperture_remove_conflicting_pci_devices+0x8c/0xa0
Aug 20 20:29:37 luna kernel: i915_driver_probe+0x7c8/0xac0 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea]
Aug 20 20:29:37 luna kernel: local_pci_probe+0x42/0x90
Aug 20 20:29:37 luna kernel: pci_device_probe+0xbd/0x290
Aug 20 20:29:37 luna kernel: ? sysfs_do_create_link_sd+0x6e/0xe0
Aug 20 20:29:37 luna kernel: really_probe+0xdb/0x340
Aug 20 20:29:37 luna kernel: ? pm_runtime_barrier+0x54/0x90
Aug 20 20:29:37 luna kernel: ? __pfx___driver_attach+0x10/0x10
Aug 20 20:29:37 luna kernel: __driver_probe_device+0x78/0x110
Aug 20 20:29:37 luna kernel: driver_probe_device+0x1f/0xa0
Aug 20 20:29:37 luna kernel: __driver_attach+0xba/0x1c0
Aug 20 20:29:37 luna kernel: bus_for_each_dev+0x8c/0xe0
Aug 20 20:29:37 luna kernel: bus_add_driver+0x112/0x1f0
Aug 20 20:29:37 luna kernel: driver_register+0x72/0xd0
Aug 20 20:29:37 luna kernel: i915_init+0x23/0x90 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea]
Aug 20 20:29:37 luna kernel: ? __pfx_i915_init+0x10/0x10 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea]
Aug 20 20:29:37 luna kernel: do_one_initcall+0x58/0x310
Aug 20 20:29:37 luna kernel: do_init_module+0x60/0x220
Aug 20 20:29:37 luna kernel: init_module_from_file+0x89/0xe0
Aug 20 20:29:37 luna kernel: idempotent_init_module+0x121/0x320
Aug 20 20:29:37 luna kernel: __x64_sys_finit_module+0x5e/0xb0
Aug 20 20:29:37 luna kernel: do_syscall_64+0x82/0x190
Aug 20 20:29:37 luna kernel: ? switch_fpu_return+0x4e/0xd0
Aug 20 20:29:37 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200
Aug 20 20:29:37 luna kernel: ? do_syscall_64+0x8e/0x190
Aug 20 20:29:37 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200
Aug 20 20:29:37 luna kernel: ? do_syscall_64+0x8e/0x190
Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80
Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80
Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80
Aug 20 20:29:37 luna kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 20 20:29:37 luna kernel: RIP: 0033:0x7b1bee2f81fd
Aug 20 20:29:37 luna kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 fa 0c 00 f7 d8 64 89 01 48
Aug 20 20:29:37 luna kernel: RSP: 002b:00007ffe062c2ac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Aug 20 20:29:37 luna kernel: RAX: ffffffffffffffda RBX: 000056171c8d0a00 RCX: 00007b1bee2f81fd
Aug 20 20:29:37 luna kernel: RDX: 0000000000000004 RSI: 00007b1bee0e5061 RDI: 0000000000000026
Aug 20 20:29:37 luna kernel: RBP: 00007ffe062c2b80 R08: 0000000000000001 R09: 00007ffe062c2b10
Aug 20 20:29:37 luna kernel: R10: 0000000000000040 R11: 0000000000000246 R12: 00007b1bee0e5061
Aug 20 20:29:37 luna kernel: R13: 0000000000020000 R14: 000056171c8d18c0 R15: 000056171c8d31e0
Aug 20 20:29:37 luna kernel: </TASK>
Aug 20 20:29:37 luna kernel: Modules linked in: intel_powerclamp ath9k(+) snd_compress coretemp ac97_bus ath9k_common snd_pcm_dmaengine kvm_intel snd_hda_intel ath9k_hw joydev snd_intel_dspcfg mousedev ath snd_intel_sdw_acpi i915(+) kvm snd_hda_codec iTCO_wdt mac80211 snd_hda_core processor_thermal_device_pci_legacy intel_pmc_bxt snd_hwdep processor_thermal_device hid_multitouch ee1004 iTCO_vendor_support processor_thermal_wt_hint drm_buddy snd_pcm rapl processor_thermal_rfim hid_generic spi_nor r8169 i2c_i801 i2c_algo_bit libarc4 memconsole_coreboot processor_thermal_rapl snd_timer intel_cstate intel_rapl_msr framebuffer_coreboot memconsole cbmem intel_uncore snd intel_rapl_common realtek ttm i2c_smbus cfg80211 mtd processor_thermal_wt_req psmouse mdio_devres pcspkr soundcore i2c_mux processor_thermal_power_floor drm_display_helper intel_lpss_pci libphy processor_thermal_mbox intel_lpss cec rfkill int340x_thermal_zone intel_pmc_core i2c_hid_acpi idma64 intel_gtt intel_soc_dts_iosf intel_pch_thermal i2c_hid intel_vsec intel_hid video
Aug 20 20:29:37 luna kernel: pmt_telemetry pmt_class pinctrl_cannonlake wmi sparse_keymap coreboot_table mac_hid pkcs8_key_parser crypto_user loop acpi_call(OE) nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uas usb_storage dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel serio_raw sha512_ssse3 atkbd sha256_ssse3 sha1_ssse3 libps2 aesni_intel vivaldi_fmap nvme crypto_simd nvme_core spi_intel_pci cryptd xhci_pci spi_intel i8042 nvme_auth xhci_pci_renesas serio librem_ec_acpi(OE)
Aug 20 20:29:37 luna kernel: CR2: 0000000300000031
Aug 20 20:29:37 luna kernel: ---[ end trace 0000000000000000 ]---

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette