Re: 4.4-final: 28 bioset threads on small notebook
From: Ming Lei
Date: Tue Feb 23 2016 - 21:48:17 EST
On Tue, Feb 23, 2016 at 10:54 PM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
> On Mon, Feb 22 2016 at 9:55pm -0500,
> Ming Lei <ming.lei@xxxxxxxxxxxxx> wrote:
>
>> On Tue, Feb 23, 2016 at 6:58 AM, Kent Overstreet
>> <kent.overstreet@xxxxxxxxx> wrote:
>> > On Sun, Feb 21, 2016 at 05:40:59PM +0800, Ming Lei wrote:
>> >> On Sun, Feb 21, 2016 at 2:43 PM, Ming Lin-SSI <ming.l@xxxxxxxxxxxxxxx> wrote:
>> >> >>-----Original Message-----
>> >> >
>> >> > So it's almost already "per request_queue"
>> >>
>> >> Yes, that is because of the following line:
>> >>
>> >> q->bio_split = bioset_create(BIO_POOL_SIZE, 0);
>> >>
>> >> in blk_alloc_queue_node().
>> >>
>> >> Looks like this bio_set doesn't need to be per-request_queue, and
>> >> now it is only used for fast-cloning bio for splitting, and one global
>> >> split bio_set should be enough.
>> >
>> > It does have to be per request queue for stacking block devices (which includes
>> > loopback).
>>
>> In commit df2cb6daa4(block: Avoid deadlocks with bio allocation by
>> stacking drivers), deadlock in this situation has been avoided already.
>> Or are there other issues with global bio_set? I appreciate if you may
>> explain it a bit if there are.
>
> Even with commit df2cb6daa4 there is still risk of deadlocks (even
> without low memory condition), see:
> https://patchwork.kernel.org/patch/7398411/
That is definitely another problem which isn't related with low memory,
and I guess Kent means there might be deadlock risk in case of shared
bio_set.
>
> (you may recall you blocked this patch with concerns about performance,
> context switches, plug merging being compromised, etc.. to which I never
> circled back to verify your concerns)
I still remember that problem:
1) Process A
- two bio(a, b) are splitted in dm's make_request funtion
- bio(a) is submitted via generic_make_request(), so it is staged
in current->bio_list
- time t1
- before bio(b) is submitted, down_write(&s->lock) is run and
never return
2) Process B:
- just during time t1, wait completion of bio(a) by down_write(&s->lock)
Then Process A waits the lock which is acquired by B first, and the
two bio(a, b)
can't reach to driver/device at all.
Looks that current->bio_list is fragile to locks from make_request function,
and moving the lock into workqueue context should be helpful.
And I am happy to continue to discuss this issue further.
>
> But it illustrates the type of problems that can occur when your rescue
> infrastructure is shared across devices (in the context of df2cb6daa4,
> current->bio_list contains bios from multiple devices).
>
> If a single splitting bio_set were shared across devices there would be
> no guarantee of forward progress with complex stacked devices (one or
> more devices could exhaust the reserve and starve out other devices in
> the stack). So keeping the bio_set per request_queue isn't prone to
> failure like a shared bio_set might be.
Not consider the dm lock problem, from Kent's commit(df2cb6daa4) log and
the patch, looks forward progress can be guaranteed for stacked devices
with same bio_set, but better to get Kent's clarification.
If forward progress can be guaranteed, percpu mempool might avoid
easy exhausting, because it is reasonable to assume that one CPU can only
provide a certain amount of bandwidth wrt. block transfer.
Thanks
Ming