Re: [PATCH 00/13 v4] block: always associate blkg and refcount cleanup
From: Dennis Zhou
Date: Tue Nov 27 2018 - 17:15:56 EST
On Tue, Nov 27, 2018 at 04:10:01PM -0500, Josef Bacik wrote:
> On Mon, Nov 26, 2018 at 04:19:33PM -0500, Dennis Zhou wrote:
> > Hi everyone,
> >
> > This is respin of v3 [1] with fixes for the errors reported in [2] and
> > [3]. v3 was reverted in [4].
> >
> > The issue in [3] was that bio->bi_disk->queue and blkg->q were out
> > of sync. So when I changed blk_get_rl() to use blkg->q, the wrong queue
> > was returned and elevator from q->elevator->type threw a NPE. Note, with
> > v4.21, the old block stack was removed and so this patch was dropped. I
> > did backport this to v4.20 and verified this series does not encounter
> > the error.
> >
> > The biggest changes in v4 are when association occurs and clearly
> > defining the cases where association should happen.
> > 1. Association is now done when the device is set to keep blkg->q and
> > bio->bi_disk->queue in sync.
> > 2. When a bio is submitted directly to the device, it will not be
> > associated with a blkg. This is because a blkg represents the
> > relationship between a blkcg and a request_queue. Going directly to
> > the device means the request_queue may not exist meaning no blkg
> > will exist.
> >
> > The patch updating blk_get_rl() was dropped (v3 10/12). The patch to
> > always associate a blkg from v3 (v3 04/12) was fixed and split into
> > patches 0004 and 0005. 0011 is new removing bio_disassociate_task().
> >
> > Summarizing the ideas of this series:
> > 1. Gracefully handle blkg failure to create by walking up the blkg
> > tree rather than fall through to root.
> > 2. Associate a bio with a blkg in core logic rather than per
> > controller logic.
> > 3. Rather than have a css and blkg reference, hold just a blkg ref
> > as it also holds a css ref.
> > 4. Switch to percpu ref counting for blkg.
> >
>
> Hmm so reading through this series it's made me realize that iolatency is sort
> of broken. It relies on knowing if it needs to do something with the bio if
> there is a blkg associated with it. Before this patchset there wouldn't be a
I don't think there is anything wrong with blk-iolatency. blk-iolatency
piggybacks off of the rq_qos hooks which when throttle gets called means
submit has happened on the bio. As a byproduct, all bios that
blk-iolatency sees has a blkcg associated with it from
blkcg_bio_issue_check(). This lets blk-iolatency associate a blkg in the
throttle hook - blkcg_iolatency_throttle().
Order of functions called:
generic_make_request()
generic_make_request_checks() <- associates blkcg
make_request_fn() (eventually blk_mq_make_request)
rq_qos_throttle()
blkcg_iolatency_throttle() <- associate blkg based on blkcg
blk-throttle is another story, it kind of does association really high
up, but lets the block layer manage the request_queue and attribute it
all to the first blkg seen. I believe this is just the top level blkg
(same css, first request_queue). Disclaimer, I haven't read through much
of the blk-throttle code.
> blkg on the bio unless it was specifically associated. I'm going to need to
> figure out a different way to tag bio's to indicate that blk-iolatency should
> care about it. Probably add a bio flag or something. Thanks,
I think the interaction gets a little confusing with stacked block
devices, but regardless, shouldn't blk-iolatency care about every bio
that comes through anyway? I think this would just translate to
throttling at each request_queue and not just the entrance queue.
If a bio has a device with no request_queue, I don't think it will ever
reach blk-iolatency because the bio goes straight to device and a
requirement to get to call rq_qos_throttle is to have a request_queue.
Thanks,
Dennis