Re: 4.15.14 crash with iscsi target and dvd

From: Bart Van Assche
Date: Sun Apr 01 2018 - 11:03:10 EST


On Sun, 2018-04-01 at 07:37 -0400, Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > > Richard Weinberger wrote:
> > > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner <wakko@xxxxxxxxxxxx> wrote:
> > > > > I reported this before but noone responded.
> > > >
> > > > Because you're sending only to LKML.
> > > > CC'ing storage folks.
> > >
> > > Thank you. I wasn't sure who I needed to send it to.
> >
> > Can you share the output of lsscsi? I would like to know whether or not you
> > are using a (S)ATA CDROM.
>
> From the target:
> [4:0:0:0] cd/dvd ATAPI iHAS224 B GL05 /dev/sr0
> [5:0:0:0] cd/dvd ATAPI iHAS422 8 4L11 /dev/sr1
> [6:0:0:0] cd/dvd PBDS DVD+-RW DH-16W1S 2D14 /dev/sr2
>
> From the initiator:
> [19:0:0:0] cd/dvd ATAPI iHAS224 B GL05 /dev/sr1
> [19:0:0:1] cd/dvd ATAPI iHAS422 8 4L11 /dev/sr2
> [19:0:0:2] cd/dvd PBDS DVD+-RW DH-16W1S 2D14 /dev/sr3
>
> I tested 4.14.32 last night with the same oops. 4.9.91 works fine.
> From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I mount
> /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> crashes. I'm using the builtin iscsi target with pscsi. I can burn from
> the initiator with out problems. I'll test other kernels between 4.9 and
> 4.14.

(+Lee and Chris)

Hello Wakko,

Although I'm not sure that what I ran into is exactly the same as what you
ran into, there is definitely something wrong with what I encountered. What
I ran into with Linus' latest master branch indicates two issues - one in
the iSCSI initiator and one in the block layer:

scsi 3:0:0:1: Direct-Access LIO-ORG FILEIO 4.0 PQ: 0 ANSI: 5
sd 2:0:0:1: [sdd] Attached SCSI disk
sd 3:0:0:1: Warning! Received an indication that the LUN assignments on this
target have changed. The Linux SCSI layer does not automatical
sd 3:0:0:1: Attached scsi generic sg8 type 0
sd 3:0:0:1: [sdf] 128 512-byte logical blocks: (65.5 kB/64.0 KiB)
sd 3:0:0:1: [sdf] Write Protect is off
sd 3:0:0:1: [sdf] Mode Sense: 43 00 00 08
sd 3:0:0:1: [sdf] Write cache: disabled, read cache: enabled, doesn't
support DPO or FUA
iSCSI/iqn.1993-08.org.debian:01:3b68b1b3d2eb: Unsupported SCSI Opcode 0xa3,
sending CHECK_CONDITION.
sd 3:0:0:2: [sde] Attached SCSI disk
sd 3:0:0:1: [sdf] Attached SCSI disk

=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
4.16.0-rc7-dbg+ #3 Not tainted
-----------------------------------------------------
kworker/6:1H/155 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
(&(&session->frwd_lock)->rlock){+.-.}, at: [<000000007eb678ec>]
iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]

and this task is already holding:
(&(&q->__queue_lock)->rlock){-.-.}, at: [<000000003c5841ec>]
blk_timeout_work+0x45/0x220
which would create a new lock dependency:
(&(&q->__queue_lock)->rlock){-.-.} -> (&(&session->frwd_lock)->rlock){+.-.}

but this new dependency connects a HARDIRQ-irq-safe lock:
(&(&q->__queue_lock)->rlock){-.-.}

... which became HARDIRQ-irq-safe at:
_raw_spin_lock_irqsave+0x46/0x60
ata_qc_schedule_eh+0xb9/0x110 [libata]
ata_sff_hsm_move+0x114/0xb10 [libata]
__ata_sff_port_intr+0xec/0x1a0 [libata]
ata_bmdma_port_intr+0xef/0x160 [libata]
ata_bmdma_interrupt+0x160/0x2e0 [libata]
__handle_irq_event_percpu+0x72/0x460
handle_irq_event_percpu+0x66/0xd0
handle_irq_event+0x54/0x90
handle_edge_irq+0xe9/0x2d0
handle_irq+0x17b/0x210
do_IRQ+0x6c/0x140
ret_from_intr+0x0/0x1e
native_safe_halt+0x2/0x10
default_idle+0x1f/0x200
do_idle+0x1bc/0x1f0
cpu_startup_entry+0x19/0x20
start_kernel+0x47f/0x4a1
secondary_startup_64+0xa5/0xb0

to a HARDIRQ-irq-unsafe lock:
(&(&session->frwd_lock)->rlock){+.-.}

... which became HARDIRQ-irq-unsafe at:
...
_raw_spin_lock_bh+0x34/0x40
iscsi_conn_setup+0x239/0x320 [libiscsi]
iscsi_tcp_conn_setup+0x14/0x80 [libiscsi_tcp]
iscsi_sw_tcp_conn_create+0x1f/0x24a [iscsi_tcp]
iscsi_if_rx+0x13c9/0x20d0 [scsi_transport_iscsi]
netlink_unicast+0x299/0x330
netlink_sendmsg+0x435/0x580
___sys_sendmsg+0x447/0x4d0
__sys_sendmsg+0xad/0x120
do_syscall_64+0xf3/0x2c0
entry_SYSCALL_64_after_hwframe+0x42/0xb7

other info that might help us debug this:

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&(&session->frwd_lock)->rlock);
local_irq_disable();
lock(&(&q->__queue_lock)->rlock);
lock(&(&session->frwd_lock)->rlock);
<Interrupt>
lock(&(&q->__queue_lock)->rlock);

*** DEADLOCK ***

3 locks held by kworker/6:1H/155:
#0: ((wq_completion)"kblockd"){+.+.}, at: [<00000000116fed84>]
process_one_work+0x387/0xa50
#1: ((work_completion)(&q->timeout_work)){+.+.}, at: [<00000000116fed84>]
process_one_work+0x387/0xa50
#2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<000000003c5841ec>]
blk_timeout_work+0x45/0x220

the dependencies between HARDIRQ-irq-safe lock and the holding lock:
-> (&(&q->__queue_lock)->rlock){-.-.} ops: 5585 {
IN-HARDIRQ-W at:
_raw_spin_lock_irqsave+0x46/0x60
ata_qc_schedule_eh+0xb9/0x110 [libata]
ata_sff_hsm_move+0x114/0xb10 [libata]
__ata_sff_port_intr+0xec/0x1a0 [libata]
ata_bmdma_port_intr+0xef/0x160 [libata]
ata_bmdma_interrupt+0x160/0x2e0 [libata]
__handle_irq_event_percpu+0x72/0x460
handle_irq_event_percpu+0x66/0xd0
handle_irq_event+0x54/0x90
handle_edge_irq+0xe9/0x2d0
handle_irq+0x17b/0x210
do_IRQ+0x6c/0x140
ret_from_intr+0x0/0x1e
native_safe_halt+0x2/0x10
default_idle+0x1f/0x200
do_idle+0x1bc/0x1f0
cpu_startup_entry+0x19/0x20
start_kernel+0x47f/0x4a1
secondary_startup_64+0xa5/0xb0
IN-SOFTIRQ-W at:
_raw_spin_lock_irqsave+0x46/0x60
scsi_end_request+0x199/0x310 [scsi_mod]
scsi_io_completion+0x3be/0x980 [scsi_mod]
blk_done_softirq+0x177/0x1c0
__do_softirq+0x117/0x5f5
irq_exit+0xe8/0xf0
do_IRQ+0xb6/0x140
ret_from_intr+0x0/0x1e
native_safe_halt+0x2/0x10
default_idle+0x1f/0x200
do_idle+0x1bc/0x1f0
cpu_startup_entry+0x19/0x20
start_kernel+0x47f/0x4a1
secondary_startup_64+0xa5/0xb0
INITIAL USE at:
_raw_spin_lock_irq+0x3b/0x50
blkcg_init_queue+0x97/0x1c0
blk_alloc_queue_node+0x41d/0x4b0
blk_mq_init_queue+0x24/0x60
virtblk_probe+0x633/0xfef [virtio_blk]
virtio_dev_probe+0x259/0x380 [virtio]
driver_probe_device+0x469/0x680
__driver_attach+0xef/0x120
bus_for_each_dev+0xe4/0x130
bus_add_driver+0x24c/0x380
driver_register+0xc6/0x170
ata_generic_init_one+0x5b/0x260 [ata_generic]
do_one_initcall+0x79/0x1b7
do_init_module+0xde/0x32d
load_module+0x3964/0x47d0
SYSC_finit_module+0x176/0x1a0
do_syscall_64+0xf3/0x2c0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
}
... key at: [<00000000da510125>] __key.53375+0x0/0x40
... acquired at:
_raw_spin_lock+0x2f/0x40
iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
scsi_times_out+0xcc/0x3f0 [scsi_mod]
blk_rq_timed_out+0x2f/0x70
blk_timeout_work+0x1b0/0x220
process_one_work+0x446/0xa50
worker_thread+0x7b/0x6d0
kthread+0x1b7/0x1e0
ret_from_fork+0x24/0x30


the dependencies between the lock to be acquired
and HARDIRQ-irq-unsafe lock:
-> (&(&session->frwd_lock)->rlock){+.-.} ops: 1985 {
HARDIRQ-ON-W at:
_raw_spin_lock_bh+0x34/0x40
iscsi_conn_setup+0x239/0x320 [libiscsi]
iscsi_tcp_conn_setup+0x14/0x80 [libiscsi_tcp]
iscsi_sw_tcp_conn_create+0x1f/0x24a [iscsi_tcp]
iscsi_if_rx+0x13c9/0x20d0 [scsi_transport_iscsi]
netlink_unicast+0x299/0x330
netlink_sendmsg+0x435/0x580
___sys_sendmsg+0x447/0x4d0
__sys_sendmsg+0xad/0x120
do_syscall_64+0xf3/0x2c0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
IN-SOFTIRQ-W at:
_raw_spin_lock_bh+0x34/0x40
iscsi_queuecommand+0xef/0x960 [libiscsi]
scsi_dispatch_cmd+0x1c7/0x550 [scsi_mod]
scsi_request_fn+0x823/0xaf0 [scsi_mod]
__blk_run_queue+0xc5/0x160
blk_run_queue+0x48/0x70
scsi_run_queue+0x475/0x5d0 [scsi_mod]
scsi_end_request+0x1cd/0x310 [scsi_mod]
scsi_io_completion+0x3be/0x980 [scsi_mod]
blk_done_softirq+0x177/0x1c0
__do_softirq+0x117/0x5f5
run_ksoftirqd+0x2e/0x50
smpboot_thread_fn+0x314/0x410
kthread+0x1b7/0x1e0
ret_from_fork+0x24/0x30
INITIAL USE at:
_raw_spin_lock_bh+0x34/0x40
iscsi_conn_setup+0x239/0x320 [libiscsi]
iscsi_tcp_conn_setup+0x14/0x80 [libiscsi_tcp]
iscsi_sw_tcp_conn_create+0x1f/0x24a [iscsi_tcp]
iscsi_if_rx+0x13c9/0x20d0 [scsi_transport_iscsi]
netlink_unicast+0x299/0x330
netlink_sendmsg+0x435/0x580
___sys_sendmsg+0x447/0x4d0
__sys_sendmsg+0xad/0x120
do_syscall_64+0xf3/0x2c0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
}
... key at: [<00000000df8f47b5>] __key.68290+0x0/0xffffffffffff6300
[libiscsi]
... acquired at:
_raw_spin_lock+0x2f/0x40
iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
scsi_times_out+0xcc/0x3f0 [scsi_mod]
blk_rq_timed_out+0x2f/0x70
blk_timeout_work+0x1b0/0x220
process_one_work+0x446/0xa50
worker_thread+0x7b/0x6d0
kthread+0x1b7/0x1e0
ret_from_fork+0x24/0x30


stack backtrace:
CPU: 6 PID: 155 Comm: kworker/6:1H Not tainted 4.16.0-rc7-dbg+ #3
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: kblockd blk_timeout_work
Call Trace:
dump_stack+0x85/0xc5
check_usage+0x6e7/0x700
? check_usage_forwards+0x220/0x220
? find_next_and_bit+0x51/0xe0
? cpumask_next_and+0x20/0x30
? find_busiest_group+0xc94/0x1010
? class_equal+0x11/0x20
? __bfs+0x62/0x2e0
? class_equal+0x11/0x20
? __bfs+0xfb/0x2e0
? __lock_acquire+0x17aa/0x1af0
__lock_acquire+0x17aa/0x1af0
? mark_lock+0xc7/0x770
? debug_check_no_locks_freed+0x1b0/0x1b0
? __lock_acquire+0x583/0x1af0
? mark_lock+0xc7/0x770
? lock_pin_lock+0x160/0x160
? debug_check_no_locks_freed+0x1b0/0x1b0
? lock_acquire+0xc9/0x260
lock_acquire+0xc9/0x260
? iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
_raw_spin_lock+0x2f/0x40
? iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
scsi_times_out+0xcc/0x3f0 [scsi_mod]
blk_rq_timed_out+0x2f/0x70
blk_timeout_work+0x1b0/0x220
process_one_work+0x446/0xa50
? pwq_dec_nr_in_flight+0x100/0x100
? worker_thread+0x177/0x6d0
worker_thread+0x7b/0x6d0
? process_one_work+0xa50/0xa50
kthread+0x1b7/0x1e0
? kthread_create_worker_on_cpu+0xc0/0xc0
ret_from_fork+0x24/0x30

[ ... ]

------------[ cut here ]------------
kernel BUG at block/blk-core.c:3267!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
Modules linked in: sd_mod crc32c_generic target_core_pscsi
target_core_iblock target_core_file iscsi_target_mod target_core_mod brd
i2c_piix4 sg virtio_balloon i2c_core af_packet button ib_iser rdma_cm iw_cm
ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
ip_tables x_tables autofs4 hid_generic usbhid hid ext4 crc16 mbcache jbd2
sr_mod cdrom ata_generic pata_acpi virtio_blk virtio_net ata_piix xhci_pci
xhci_hcd libata virtio_pci usbcore scsi_mod virtio_ring intel_agp usb_common
intel_gtt virtio agpgart
CPU: 2 PID: 151 Comm: scsi_eh_1 Tainted: G W 4.16.0-rc7-dbg+
#3
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
RIP: 0010:__blk_end_request_all+0xda/0xe0
RSP: 0018:ffff88006192f980 EFLAGS: 00010002
sr 2:0:0:3: rejecting I/O to offline device
sr 3:0:0:3: rejecting I/O to offline device
RAX: 0000000000000001 RBX: ffff88006818e780 RCX: ffffffff814065a6
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88006818e838
RBP: 000000000000000a R08: 0000000000000000 R09: 0000000000000012
R10: ffff88006192f588 R11: 000000005e4786a3 R12: 0000000000000000
R13: 0000000000000000 R14: ffff880061280160 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff88006d280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1fcc53f010 CR3: 00000000666fc000 CR4: 00000000000006e0
Call Trace:
blk_peek_request+0x1ff/0x5f0
scsi_request_fn+0x48/0xaf0 [scsi_mod]
? lock_acquire+0xc9/0x260
__blk_run_queue+0xc5/0x160
blk_run_queue+0x48/0x70
scsi_run_queue+0x475/0x5d0 [scsi_mod]
? scsi_io_completion+0x69e/0x980 [scsi_mod]
? sdev_evt_alloc+0x80/0x80 [scsi_mod]
? blk_queue_end_tag+0x10a/0x210
? __list_add_valid+0x29/0xa0
? do_raw_spin_unlock+0x91/0x120
scsi_io_completion+0x6a6/0x980 [scsi_mod]
? lock_downgrade+0x200/0x2b0
? scsi_end_request+0x310/0x310 [scsi_mod]
? scsi_device_unbusy+0x3b/0x60 [scsi_mod]
scsi_eh_flush_done_q+0x1a6/0x210 [scsi_mod]
ata_scsi_port_error_handler+0x794/0xb00 [libata]
ata_scsi_error+0x168/0x1a0 [libata]
? ata_scsi_port_error_handler+0xb00/0xb00 [libata]
? _raw_spin_unlock_irqrestore+0x59/0x70
scsi_error_handler+0x1b5/0xa40 [scsi_mod]
? scsi_eh_get_sense+0x3b0/0x3b0 [scsi_mod]
? __sched_text_start+0x8/0x8
? __wake_up_common+0x9c/0x230
? mark_held_locks+0x1c/0x90
? _raw_spin_unlock_irqrestore+0x59/0x70
? scsi_eh_get_sense+0x3b0/0x3b0 [scsi_mod]
kthread+0x1b7/0x1e0
? kthread_create_worker_on_cpu+0xc0/0xc0
ret_from_fork+0x24/0x30
Code: 85 c0 75 02 0f 0b 48 89 df e8 b3 99 eb ff 4c 8b 23 e9 6e ff ff ff 0f
0b eb 82 49 8d 7c 24 20 e8 9d 98 eb ff 45 8b 6c 24 20 eb 8c <0f> 0b 0f 1f 40
00 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 48
RIP: __blk_end_request_all+0xda/0xe0 RSP: ffff88006192f980
---[ end trace b9c2429e31acedb4 ]---