Re: [PATCH v3 3/3] block: avoid to call .bi_end_io() recursively

From: Mikulas Patocka
Date: Fri Apr 29 2016 - 05:59:18 EST




On Fri, 29 Apr 2016, Ming Lei wrote:

> > If some block device driver in a process context finds out that it needs
> > to terminate a bio, it calls bio_endio in a process context. Why would it
> > need to trigger an interrupt just to call bio_endio? (and how could it
> > trigger an interrupt if that driver perhaps doesn't use interrupts at
> > all?) I don't know what are you trying to suggest.
>
> That should be the failure path, and this patch just aims at the normal bio
> complete path which is run at 99.999% time. And irq handler often
> borrows current process's stack and cause huge stack uses.

For some block devices it is common to call bi_endio from process context,
for example dm-crypt or dm-verity do it for all read bios. loop driver
calls bi_endio from process context on every bio it processes.

> >> It isn't easy to avoid the recursive calling in process context except you
> >> can add something 'task_struct' or introduce 'alloca()' in kernel. Or do you
> >> have better ideas?
> >
> > preempt_disable around the bi_endio callback should be sufficient.
>
> How can a sleepable function be run in preempt disabled context?
>
> Looks you still don't believe the bi_endio scheduled to process context
> can sleep, do you?

I don't believe bi_endio function can sleep. bi_endio function doesn't
know if it will be called from process or interrupt context - so it can't
sleep. If bi_endio function slept, bug would happen if it were called from
interrupt context.

btrfs is a special case where btrfs calls bio_endio on bios that it
created on it own (note that this trick only works when the same driver
creates a bio and terminates the bio with bio_endio - if some other driver
terminated the bio, it wouldn't work because the other driver could
terminate the bio from interrupt context).

That could be simply fixed by calling bio->bi_end_io or
btrfs_endio_direct_read directly or making a new function btrfs_endio.

> >> Not mention wait_for_completion() in both __btrfs_correct_data_nocsum()
> >> and __btrfs_subio_endio_read(), which are called by btrfs_subio_endio_read()
> >> in btrfs_endio_direct_read().
> >
> > btrfs is calling bio_endio on bio that it created. So, bio_endio in btrfs
> > can be replaced with:
> > if (bio->bi_end_io == btrfs_endio_direct_read)
> > btrfs_endio_direct_read(bio);
> > else
> > bio_endio(bio);
> >
> > ... or just maybe just with this:
> > bio->bi_end_io(bio);
>
> It is really ugly and should be avoided most of times.
>
> >
> > This could be fixed easily.
>
> Suppose it might be done in this way, and there may be other .bi_end_io
> which can sleep too, you need to make sure there isn't any such usage
> first before running it with preempt disabled.
>
> Anyway looks you worry about the seldom-run failure path, which isn't
> addressed by this patch. And you can optimize that in future and that isn't
> contradictory with this patch.

Worrying about seldom-run failrure paths is nothing wrong. These paths are
not any less important than the common code paths.

Mikulas

> Thanks,
> Ming