Re: blktests: block/009 next-20210304 failure rate average of 1/448

From: Luis Chamberlain
Date: Thu Mar 18 2021 - 13:55:28 EST

Adding linux-fsdevel as folks working on fstests might be

On Tue, Mar 16, 2021 at 05:46:45PM +0000, Luis Chamberlain wrote:
> My personal suspicion is not on the block layer but on scsi_debug
> because this can fail:
> modprobe scsi_debug; rmmod scsi_debug
> This second issue may be a secondary separate issue, but I figured
> I'd mention it. To fix this later issue I've looked at ways to
> make scsi_debug_init() wait until its scsi devices are probed,
> however its not clear how to do this correctly. If someone has
> an idea let me know. If that fixes this issue then we know it was
> that.

OK so this other issue with scsi_debug indeed deserves its own tracking
so I filed a bug for it but also looked into it and tried to see how to
resolve it.

Someone who works on scsi should revise my work as I haven't touched
scsi before except for the recent block layer work I had done for the
blktrace races, however, my own analysis is that this should not be
fixed in scsi_debug but instead in the users of scsi_debug.

The rationale for that is here:

The skinny of it is that we have no control over when userspace may muck
with the newly exposed devices as they are being initialized, and
shoe-horning a solution in scsi_debug_init() is prone to always be allow
a race with userspace never letting scsi_debug_init() complete.

So best we can do is just use something like lsof on the tools which
use scsi_debug *prior* to mucking with the devices and / or removal of
the module.

I'll follow up with respective blktests / fstests patches, which I
suspect may address a few false positives.