Re: [PATCH 3/5] blktrace: refcount the request_queue during ioctl

From: Luis Chamberlain
Date: Wed Apr 15 2020 - 02:17:11 EST


On Tue, Apr 14, 2020 at 08:40:44AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 14, 2020 at 04:19:00AM +0000, Luis Chamberlain wrote:
> > Ensure that the request_queue is refcounted during its full
> > ioctl cycle. This avoids possible races against removal, given
> > blk_get_queue() also checks to ensure the queue is not dying.
> >
> > This small race is possible if you defer removal of the request_queue
> > and userspace fires off an ioctl for the device in the meantime.
>
> Hmm, where exactly does the race come in so that it can only happen
> after where you take the reference, but not before it? I'm probably
> missing something, but that just means it needs to be explained a little
> better :)

>From the trace on patch 2/5:

BLKTRACE_SETUP(loop0) #2
[ 13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start
[ 13.936758] === do_blk_trace_setup(2) start
[ 13.938944] === do_blk_trace_setup(2) creating directory
[ 13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave

---> From LOOP_CTL_DEL(loop0) #2
[ 13.971046] === blk_trace_cleanup(7) end
[ 13.973175] == __blk_trace_remove(7) end
[ 13.975352] == blk_trace_shutdown(7) end
[ 13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister()
[ 13.980645] ==== blk_mq_debugfs_unregister(7) begin
[ 13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir)
[ 13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL
[ 13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end
[ 13.993155] = __blk_release_queue(7) end

---> From BLKTRACE_SETUP(loop0) #2
[ 13.995928] === do_blk_trace_setup(2) end with ret: 0
[ 13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end

The BLKTRACESETUP above works on request_queue which later
LOOP_CTL_DEL races on and sweeps the debugfs dir underneath us.
If you use this commit alone though, this doesn't fix the race issue
however, and that's because of both still the debugfs_lookup() use
and that we're still using asynchronous removal at this point.

refcounting will just ensure we don't take the request_queue underneath
our noses.

Should I just add this to the commit log?

Luis