Re: NULL pointer dereference in qla24xx_abort_command, kernel 4.19.98 (Debian)
From: Ondrej Zary
Date: Thu Mar 19 2020 - 14:01:52 EST
On Monday 02 March 2020 23:26:08 Ondrej Zary wrote:
> On Thursday 27 February 2020 18:09:07 Ondrej Zary wrote:
> >
> > On Tuesday 25 February 2020 04:41:48 Bart Van Assche wrote:
> > > On 2020-02-24 00:20, Ondrej Zary wrote:
> > > > Looks like it's in some inlined function.
> > > >
> > > > /usr/src/linux-source-4.19# gdb /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> > > > GNU gdb (Debian 8.2.1-2+b3) 8.2.1
> > > > ...
> > > > Reading symbols from /lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...Reading symbols
> > > > from /usr/lib/debug//lib/modules/4.19.0-8-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko...done.
> > > > done.
> > > >
> > > > (gdb) list *(qla24xx_async_abort_cmd+0x1b)
> > > > 0xf88b is in qla24xx_async_abort_cmd (./arch/x86/include/asm/atomic.h:97).
> > > > 92 *
> > > > 93 * Atomically increments @v by 1.
> > > > 94 */
> > > > 95 static __always_inline void arch_atomic_inc(atomic_t *v)
> > > > 96 {
> > > > 97 asm volatile(LOCK_PREFIX "incl %0"
> > > > 98 : "+m" (v->counter) :: "memory");
> > > > 99 }
> > > > 100 #define arch_atomic_inc arch_atomic_inc
> > > >
> > > > [ ... ]
> > > >
> > > > (gdb) disassemble qla24xx_async_abort_cmd
> > > > Dump of assembler code for function qla24xx_async_abort_cmd:
> > > > 0x000000000000f870 <+0>: callq 0xf875 <qla24xx_async_abort_cmd+5>
> > > > 0x000000000000f875 <+5>: push %r15
> > > > 0x000000000000f877 <+7>: push %r14
> > > > 0x000000000000f879 <+9>: push %r13
> > > > 0x000000000000f87b <+11>: push %r12
> > > > 0x000000000000f87d <+13>: push %rbp
> > > > 0x000000000000f87e <+14>: push %rbx
> > > > 0x000000000000f87f <+15>: mov 0x28(%rdi),%r13
> > > > 0x000000000000f883 <+19>: mov 0x20(%rdi),%r15
> > > > 0x000000000000f887 <+23>: mov 0x48(%rdi),%r14
> > > > 0x000000000000f88b <+27>: lock incl 0x4(%r14)
> > > > 0x000000000000f890 <+32>: mfence
> > >
> > > Thanks, this is very helpful. I think the above means that the crash is
> > > triggered by the following code:
> > >
> > > sp = qla2xxx_get_qpair_sp(cmd_sp->qpair, cmd_sp->fcport,
> > > GFP_KERNEL);
> > >
> > > From the start of qla2xxx_get_qpair_sp():
> > >
> > > QLA_QPAIR_MARK_BUSY(qpair, bail);
> > >
> > > From qla_def.h:
> > >
> > > #define QLA_QPAIR_MARK_BUSY(__qpair, __bail) do { \
> > > atomic_inc(&__qpair->ref_count); \
> > > mb(); \
> > > if (__qpair->delete_in_progress) { \
> > > atomic_dec(&__qpair->ref_count); \
> > > __bail = 1; \
> > > } else { \
> > > __bail = 0; \
> > > } \
> > > } while (0)
> > >
> > > One of the changes between kernel version v4.9.210 and v4.19.98 is the
> > > following: "qla2xxx: Add multiple queue pair functionality". I think the
> > > above information means that the cmd_sp->qpair pointer is NULL. I will
> > > let QLogic recommend a solution.
> >
> > Thank you very much for the analysis.
> > Unfortunately, QLogic does not seem to care...
>
> Let's try to CC the people at Cavium that signed-off the commit.
No reply.
qla2xxx-upstream@xxxxxxxxxx address is dead:
Generating server: DC5-EXCH01.marvell.com
qla2xxx-upstream@xxxxxxxxxx
Remote Server returned '550 5.1.1 RESOLVER.ADR.RecipNotFound; not found'
Added some more CC addresses.
Yesterday it crashed again at the same place:
[2076301.849762] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[2076301.850021] PGD 0 P4D 0
[2076301.850109] Oops: 0002 [#1] SMP PTI
[2076301.850219] CPU: 4 PID: 18992 Comm: kworker/u16:1 Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
[2076301.850478] Hardware name: Dell Inc. PowerEdge 2950/0JR815, BIOS 2.7.0 10/30/2010
[2076301.850720] Workqueue: scsi_tmf_4 scmd_eh_abort_handler [scsi_mod]
[2076301.850936] RIP: 0010:qla24xx_async_abort_cmd+0x1b/0x250 [qla2xxx]
[2076301.851130] Code: e9 19 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 41 55 41 54 55 53 4c 8b 6f 28 4c 8b 7f 20 4c 8b 77 48 <f0> 41 ff 46 04 0f a
e f0 41 f6 46 24 04 74 17 f0 41 ff 4e 04 bd 02
[2076301.851663] RSP: 0018:ffffa10f8bbe7da8 EFLAGS: 00010293
[2076301.851820] RAX: 0000000000000800 RBX: ffff8ab8ddd197a8 RCX: 0000000000000070
[2076301.852036] RDX: ffff8ab8de4a8388 RSI: 0000000000000001 RDI: ffff8ab8799b8c40
[2076301.852253] RBP: ffff8ab8dc96c480 R08: ffffffffc03b7860 R09: 0000000000000000
[2076301.852469] R10: 8080808080808080 R11: 0000000000000010 R12: ffff8ab8dea00000
[2076301.852686] R13: ffff8ab8ddd197a8 R14: 0000000000000000 R15: ffff8ab8dd632000
[2076301.852902] FS: 0000000000000000(0000) GS:ffff8ab8e7b00000(0000) knlGS:0000000000000000
[2076301.853142] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2076301.853320] CR2: 0000000000000004 CR3: 00000002203dc000 CR4: 00000000000006e0
[2076301.853536] Call Trace:
[2076301.853632] qla24xx_abort_command+0x218/0x2d0 [qla2xxx]
[2076301.853799] ? __switch_to_asm+0x41/0x70
[2076301.853924] ? __switch_to_asm+0x35/0x70
[2076301.854056] qla2xxx_eh_abort+0x117/0x310 [qla2xxx]
[2076301.854209] scmd_eh_abort_handler+0x85/0x220 [scsi_mod]
[2076301.854375] process_one_work+0x1a7/0x3a0
[2076301.854506] worker_thread+0x30/0x390
[2076301.854628] ? create_worker+0x1a0/0x1a0
[2076301.854753] kthread+0x112/0x130
[2076301.854859] ? kthread_bind+0x30/0x30
[2076301.854980] ret_from_fork+0x35/0x40
[2076301.855095] Modules linked in: loop ipmi_ssif radeon coretemp ttm drm_kms_helper drm kvm i2c_algo_bit i5000_edac iTCO_wdt sg iTCO_vendor_support irqbypass evdev i5k_
amb serio_raw joydev ipmi_si rng_core pcc_cpufreq dcdbas pcspkr ipmi_devintf acpi_cpufreq ipmi_msghandler button ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb crypt
o_simd cryptd glue_helper aes_x86_64 dm_service_time dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua uas usb_storage hid_generic usbhid hid sr_mod cdrom ses enc
losure sd_mod scsi_transport_sas ata_generic qla2xxx ata_piix nvme_fc ehci_pci nvme_fabrics libata uhci_hcd psmouse ehci_hcd nvme_core megaraid_sas usbcore scsi_transport
_fc lpc_ich mfd_core scsi_mod usb_common bnx2
[2076301.856887] CR2: 0000000000000004
[2076301.856999] ---[ end trace e9083db8fb76e126 ]---
[2076301.857151] RIP: 0010:qla24xx_async_abort_cmd+0x1b/0x250 [qla2xxx]
[2076301.857345] Code: e9 19 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 41 55 41 54 55 53 4c 8b 6f 28 4c 8b 7f 20 4c 8b 77 48 <f0> 41 ff 46 04 0f a
e f0 41 f6 46 24 04 74 17 f0 41 ff 4e 04 bd 02
[2076301.857878] RSP: 0018:ffffa10f8bbe7da8 EFLAGS: 00010293
[2076301.858035] RAX: 0000000000000800 RBX: ffff8ab8ddd197a8 RCX: 0000000000000070
[2076301.858251] RDX: ffff8ab8de4a8388 RSI: 0000000000000001 RDI: ffff8ab8799b8c40
[2076301.858467] RBP: ffff8ab8dc96c480 R08: ffffffffc03b7860 R09: 0000000000000000
[2076301.869384] R10: 8080808080808080 R11: 0000000000000010 R12: ffff8ab8dea00000
[2076301.880412] R13: ffff8ab8ddd197a8 R14: 0000000000000000 R15: ffff8ab8dd632000
[2076301.891483] FS: 0000000000000000(0000) GS:ffff8ab8e7b00000(0000) knlGS:0000000000000000
[2076301.902490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2076301.913344] CR2: 0000000000000004 CR3: 00000002203dc000 CR4: 00000000000006e0
[2077225.259348] mysqld[2155]: segfault at 0 ip 000056409366ad93 sp 00007fa049514450 error 6 in mysqld[564092eb2000+805000]
[2077225.270564] Code: c7 45 00 00 00 00 00 8b 7d cc 4c 89 e2 4c 89 f6 e8 62 81 84 ff 49 89 c7 49 39 c4 0f 84 f6 00 00 00 e8 e1 1c 00 00 41 8b 4d 00 <89> 08 85 c9 74 37 4
9 83 ff ff 0f 84 9d 00 00 00 f6 c3 06 75 28 4d
--
Ondrej Zary