Re: [PATCH 7/8] wbt: add general throttling mechanism

From: Jens Axboe
Date: Tue May 03 2016 - 11:32:44 EST

On 05/03/2016 09:22 AM, Jan Kara wrote:
On Tue 03-05-16 08:23:27, Jens Axboe wrote:
On 05/03/2016 03:34 AM, Jan Kara wrote:
On Thu 28-04-16 12:53:50, Jens Axboe wrote:
2) As far as I can see in patch 8/8, you have plugged the throttling above
the IO scheduler. When there are e.g. multiple cgroups with different IO
limits operating, this throttling can lead to strange results (like a
cgroup with low limit using up all available background "slots" and thus
effectively stopping background writeback for other cgroups)? So won't
it make more sense to plug this below the IO scheduler? Now I understand
there may be other problems with this but I think we should put more
though to that and provide some justification in changelogs.

One complexity is that we have to do this early for blk-mq, since once you
get a request, you're already sitting on the hw tag. CoDel should actually
work fine at each hop, so hopefully this will as well.

OK, I see. But then this suggests that any IO scheduling and / or
cgroup-related throttling should happen before we get a request for blk-mq
as well? And then we can still do writeback throttling below that layer?

Not necessarily. For IO scheduling, basically we care about two parts:

1) Are you allowed to allocate the resources to queue some IO
2) Are you allowed to dispatch

But then it seems suboptimal to waste a relatively scarce resource (which
HW tag is AFAIU) just because you happen to run from a cgroup that is
bandwidth limited and thus are not allowed to dispatch?

For some cases, you are absolutely right, and #1 is the main one. For your case of QD=1, that's obviously the case. For SATA, it's a bit more grey zone, and for others (nvme, scsi, etc), it's not really a scarce resource so #2 is the bigger part of it.

Jens Axboe