Re: [PATCH -next] block: fix blktrace debugfs entries leak

From: Yu Kuai
Date: Mon May 29 2023 - 22:08:23 EST


Hi, Christoph

在 2023/05/12 15:14, Yu Kuai 写道:
Hi,

在 2023/05/11 23:28, Christoph Hellwig 写道:
On Thu, May 11, 2023 at 02:56:33PM +0800, Yu Kuai wrote:
From: Yu Kuai <yukuai3@xxxxxxxxxx>

Commit 99d055b4fd4b ("block: remove per-disk debugfs files in
blk_unregister_queue") moves blk_trace_shutdown() from
blk_release_queue() to blk_unregister_queue(), this is safe if blktrace
is created through sysfs, however, there are some regression in corner
cases:

1) for scsi, passthrough io can still be issued after del_gendisk, and
    blktrace debugfs entries will be removed immediately after
    del_gendisk(), therefor passthrough io can't be tracked and blktrace
    will complain:

    failed read of /sys/kernel/debug/block/sdb/trace0: 5/Input/output error

But that is the right thing.  The only thing that has a name is the
gendisk and it is gone at this point.  Leaking the debugfs entries
that are named after, and ultimatively associated with the gendisk
(even if the code is still a bit confused about this) will create a lot
of trouble for us.

2) blktrace can still be enabled after del_gendisk() through ioctl if the
    disk is opened before del_gendisk(), and if blktrace is not shutdown
    through ioctl before closing the disk, debugfs entries will be
    leaked.

Yes.

It seems 1) is not important, while 2) needs to be fixed apparently.

Fix this problem by shutdown blktrace in blk_free_queue(),
disk_release() is not used because scsi sg support blktrace without
gendisk, and this is safe because queue is not freed yet, and
blk_trace_shutdown() is reentrant.

I think disk_release is the right place for "normal" blktrace.  The
odd cdev based blktrace for /dev/sg will need separate handling.
To be honest I'm not even sure how /dev/sg based passthrough is
even supposed to work in practice, but I'll need to spend some more
time to familarize myself with it.

I'm not sure how to specail hanlde /dev/sg* for now, however,
If we don't care about blktrace for passthrough io after del_gendisk(),
and /dev/sg* has separate handling, I think it's better just to check
QUEUE_FLAG_REGISTERED in blk_trace_setup(), and don't enable blktrace
in the first place.

Any suggestions about this problem? Should we use separate handling for
/dev/sd? Or just free blktrace in blk_free_queue().

Thanks,
Kuai

.