4.9.0+4.19.0: Bug in megasas driver (?), if booting from USB stick

From: Nico Schottelius
Date: Tue Jun 02 2020 - 12:57:52 EST



Hello kernel hackers,

I've this "funny" problem: if I netboot servers via the firmware in the
network card, the system comes up normal.

If I boot iPXE from a usb stick and then netboot, the megasas driver
fails to init (call trace below, full dmesg attached). The system also
hangs during the init for probably 30 seconds.

Same kernel, same kernel parameters, same OS.

I can reproduce this on Dell R815, Dell R710 with 3.5" disks, but not on
Dell R710 with 2.5" disk slots.

I can also reproduce this with 4.9.0-11-amd64 and 4.19.0-9-amd64
(current Debian Buster).

Does anyone have an idea what the influence booting from an USB stick
can have on the megasas controller? In these boxes there is a Perc H700
and I've seen the same behaviour with Perc H800 as well.

I played with edd=off (which is required for some R710s), but that does
not fix the problem.

Any pointers are much appreciated.

Best regards,

Nico

Attachment: dmesg-r815
Description: Binary data



[ 264.944087] megaraid_sas 0000:05:00.0: Failed to init firmware
[ 264.946621] ------------[ cut here ]------------
[ 264.946677] WARNING: CPU: 0 PID: 827 at /build/linux-XzZAcJ/linux-4.9.189/kernel/irq/manage.c:1493 __free_irq+0xa2/0x280
[ 264.946776] Trying to free already-free IRQ 180
[ 264.946821] Modules linked in: sd_mod joydev uas usb_storage hid_generic usbhid hid sg sr_mod cdrom ipmi_devintf evdev dcdbas amd64_edac_mod edac_mce_amd edac_core ahci libahci kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ohci_pci megaraid_sas(+) ixgbe ghash_clmulni_intel ptp libata pps_core mdio mgag200 ohci_hcd aesni_intel aes_x86_64 ttm lrw drm_kms_helper ehci_pci gf128mul dca glue_helper bnx2 sp5100_tco ablk_helper psmouse ehci_hcd pcspkr serio_raw drm cryptd fam15h_power i2c_algo_bit k10temp i2c_piix4 usbcore ipmi_si scsi_mod usb_common shpchp ipmi_msghandler acpi_power_meter button
[ 264.947638] CPU: 0 PID: 827 Comm: kworker/0:2 Not tainted 4.9.0-11-amd64 #1 Debian 4.9.189-3
[ 264.947716] Hardware name: Dell Inc. PowerEdge R815/04Y8PT, BIOS 3.2.2 09/15/2014
[ 264.947791] Workqueue: events work_for_cpu_fn
[ 264.947839] 0000000000000000 ffffffff8ef353d4 ffffa83e7353bc30 0000000000000000
[ 264.947927] ffffffff8ec7a83b 00000000000000b4 ffffa83e7353bc88 ffff9d3bd2731200
[ 264.948013] 00000000000000b4 ffff9d3bd27312d4 0000000000000246 ffffffff8ec7a8bf
[ 264.948099] Call Trace:
[ 264.948136] [<ffffffff8ef353d4>] ? dump_stack+0x5c/0x78
[ 264.948190] [<ffffffff8ec7a83b>] ? __warn+0xcb/0xf0
[ 264.948242] [<ffffffff8ec7a8bf>] ? warn_slowpath_fmt+0x5f/0x80
[ 264.948313] [<ffffffffc081a9c8>] ? megasas_free_cmds+0x48/0x70 [megaraid_sas]
[ 264.948384] [<ffffffff8ecd6fe2>] ? __free_irq+0xa2/0x280
[ 264.948438] [<ffffffff8ecd7247>] ? free_irq+0x37/0x90
[ 264.948498] [<ffffffffc0814285>] ? megasas_destroy_irqs+0x45/0x80 [megaraid_sas]
[ 264.948578] [<ffffffffc081e915>] ? megasas_probe_one+0xa05/0x1d30 [megaraid_sas]
[ 264.948652] [<ffffffff8ec9a9e2>] ? __kthread_create_on_node+0x132/0x180
[ 264.948721] [<ffffffff8ecb0dd3>] ? check_preempt_wakeup+0x103/0x210
[ 264.948785] [<ffffffff8ef840b4>] ? local_pci_probe+0x44/0xa0
[ 264.948842] [<ffffffff8ec915f6>] ? work_for_cpu_fn+0x16/0x20
[ 264.948899] [<ffffffff8ec9486a>] ? process_one_work+0x18a/0x430
[ 264.948958] [<ffffffff8ec94cda>] ? worker_thread+0x1ca/0x490
[ 264.949016] [<ffffffff8ec94b10>] ? process_one_work+0x430/0x430
[ 264.949078] [<ffffffff8ec9abc9>] ? kthread+0xd9/0xf0
[ 264.949130] [<ffffffff8ec9aaf0>] ? kthread_park+0x60/0x60
[ 264.949188] [<ffffffff8f21c564>] ? ret_from_fork+0x44/0x70
[ 264.949243] ---[ end trace 277700523668533a ]---


--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch