Re: [PATCH 0/2] target: Fix v4.19-rc active I/O shutdown deadlock

From: Nicholas A. Bellinger
Date: Wed Oct 10 2018 - 00:20:21 EST


On Wed, 2018-10-10 at 03:23 +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
>
> Hi MNC, MKP & Co,
>
> While testing v4.19-rc recently with simple backend I/O error injection
> (via delayed BIO completion), I was able to trigger an end-less loop
> deadlock with recent changes in commit 00d909a107:
>
> Author: Bart Van Assche <bart.vanassche@xxxxxxx>
> Date: Fri Jun 22 14:52:53 2018 -0700
>
> scsi: target: Make the session shutdown code also wait for commands that are being aborted
>
> It comes down to an incorrect assumption wrt signals during session
> shutdown plus active I/O quiesce, which triggers an endless loop
> immediately during session shutdown as se_session->sess_list_wq
> waits for outstanding backend I/O to complete.
>
> The easiest reproduction is with iser-target or simulation with plain
> old iscsi-target/TCP ports.

For reference, attached are two debug patches and instructions to
trigger the end-less loop deadlock regression on v4.19-rc.

1) Simulate iscsi-target via iscsit_transport->iscsi_wait_conn()

This makes iscsi-target/TCP follow isert_wait_conn() code, and uses
iscsit_transport->iscsi_wait_conn() during active I/O shutdown to invoke
target_wait_for_sess_cmds() with signals pending per existing
iser-target session shutdown logic.

Useful to trigger in a VM, without a RDMA capable NIC.

2) Simulate IBLOCK WRITE delayed completion by 60 seconds

MNC likes to use scsi_debug for this, but I use BRD to add an arbitrary
completion delay.

-----------------------------------------------------------------------

So once an /sys/kernel/config/target/core/$IBLOCK_HBA/$IBLOCK_DEV/ has
been created + exported via /sys/kernel/config/target/iscsi/$IQN/$TPGT/,
issue a single block WRITE.

Once WRITE completion is delayed by IBLOCK, go ahead and send a 'kill
-SIGINT $PID' to iscsi_trx kthread to trigger usual iscsi/iser session
shutdown + reconnect for the connection with the outstanding delayed
I/O.

Once target_wait_for_sess_cmds() is called with signals pending, it will
immediately kill the machine.