Re: [PATCH 3/5] blktrace: refcount the request_queue during ioctl
From: Bart Van Assche
Date: Wed Apr 15 2020 - 10:45:55 EST
On 2020-04-14 23:16, Luis Chamberlain wrote:
> On Tue, Apr 14, 2020 at 08:40:44AM -0700, Christoph Hellwig wrote:
>> Hmm, where exactly does the race come in so that it can only happen
>> after where you take the reference, but not before it? I'm probably
>> missing something, but that just means it needs to be explained a little
>> better :)
>
>>From the trace on patch 2/5:
>
> BLKTRACE_SETUP(loop0) #2
> [ 13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start
> [ 13.936758] === do_blk_trace_setup(2) start
> [ 13.938944] === do_blk_trace_setup(2) creating directory
> [ 13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave
>
> ---> From LOOP_CTL_DEL(loop0) #2
> [ 13.971046] === blk_trace_cleanup(7) end
> [ 13.973175] == __blk_trace_remove(7) end
> [ 13.975352] == blk_trace_shutdown(7) end
> [ 13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister()
> [ 13.980645] ==== blk_mq_debugfs_unregister(7) begin
> [ 13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir)
> [ 13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL
> [ 13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end
> [ 13.993155] = __blk_release_queue(7) end
>
> ---> From BLKTRACE_SETUP(loop0) #2
> [ 13.995928] === do_blk_trace_setup(2) end with ret: 0
> [ 13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end
>
> The BLKTRACESETUP above works on request_queue which later
> LOOP_CTL_DEL races on and sweeps the debugfs dir underneath us.
> If you use this commit alone though, this doesn't fix the race issue
> however, and that's because of both still the debugfs_lookup() use
> and that we're still using asynchronous removal at this point.
>
> refcounting will just ensure we don't take the request_queue underneath
> our noses.
I think the above trace reveals a bug in the loop driver. The loop
driver shouldn't allow the associated request queue to disappear while
the loop device is open. One may want to have a look at sd_open() in the
sd driver. The scsi_disk_get() call in that function not only increases
the reference count of the SCSI disk but also of the underlying SCSI device.
Thanks,
Bart.