Re: 4.15.14 crash with iscsi target and dvd
From: Wakko Warner
Date: Thu Apr 05 2018 - 21:46:55 EST
Bart Van Assche wrote:
> On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote:
> > Wakko Warner wrote:
> > > Wakko Warner wrote:
> > > > I tested 4.14.32 last night with the same oops. 4.9.91 works fine.
> > > > From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I mount
> > > > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > > > crashes. I'm using the builtin iscsi target with pscsi. I can burn from
> > > > the initiator with out problems. I'll test other kernels between 4.9 and
> > > > 4.14.
> > >
> > > So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest patch
> > > (except for 4.15 which was 1 behind)
> > > Each of these kernels crash within seconds or immediate of doing find -type
> > > f | xargs cat > /dev/null from the initiator.
> >
> > I tried 4.10.0. It doesn't completely lockup the system, but the device
> > that was used hangs. So from the initiator, it's /dev/sr1 and from the
> > target it's /dev/sr0. Attempting to read /dev/sr0 after the oops causes the
> > process to hang in D state.
>
> Hello Wakko,
>
> Thank you for having narrowed down this further. I think that you encountered
> a regression either in the block layer core or in the SCSI core. Unfortunately
> the number of changes between kernel versions v4.9 and v4.10 in these two
> subsystems is huge. I see two possible ways forward:
> - Either that you perform a bisect to identify the patch that introduced this
> regression. However, I'm not sure whether you are familiar with the bisect
> process.
> - Or that you identify the command that triggers this crash such that others
> can reproduce this issue without needing access to your setup.
>
> How about reproducing this crash with the below patch applied on top of
> kernel v4.15.x? The additional output sent by this patch to the system log
> should allow us to reproduce this issue by submitting the same SCSI command
> with sg_raw.
Ok, so I tried this, but scsi_print_command doesn't print anything. I added
a check for !rq and the same thing that blk_rq_nr_phys_segments does in an
if statement above this thinking it might have crashed during WARN_ON_ONCE.
It still didn't print anything. My printk shows this:
[ 36.263193] sr 3:0:0:0: cmd->request->nr_phys_segments is 0
I also had scsi_print_command in the same if block which again didn't print
anything. Is there some debug option I need to turn on to make it print? I
tried looking through the code for this and following some of the function
calls but didn't see any config options.
> Subject: [PATCH] Report commands with no physical segments in the system log
>
> ---
> drivers/scsi/scsi_lib.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 6b6a6705f6e5..74a39db57d49 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1093,8 +1093,10 @@ int scsi_init_io(struct scsi_cmnd *cmd)
> bool is_mq = (rq->mq_ctx != NULL);
> int error = BLKPREP_KILL;
>
> - if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq)))
> + if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq))) {
> + scsi_print_command(cmd);
> goto err_exit;
> + }
>
> error = scsi_init_sgtable(rq, &cmd->sdb);
> if (error)
--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.