On Thu 28-04-16 12:53:50, Jens Axboe wrote:
2) As far as I can see in patch 8/8, you have plugged the throttling above
the IO scheduler. When there are e.g. multiple cgroups with different IO
limits operating, this throttling can lead to strange results (like a
cgroup with low limit using up all available background "slots" and thus
effectively stopping background writeback for other cgroups)? So won't
it make more sense to plug this below the IO scheduler? Now I understand
there may be other problems with this but I think we should put more
though to that and provide some justification in changelogs.
One complexity is that we have to do this early for blk-mq, since once you
get a request, you're already sitting on the hw tag. CoDel should actually
work fine at each hop, so hopefully this will as well.
OK, I see. But then this suggests that any IO scheduling and / or
cgroup-related throttling should happen before we get a request for blk-mq
as well? And then we can still do writeback throttling below that layer?
But yes, fairness is something that we have to pay attention to. Right now
the wait queue has no priority associated with it, that should probably be
improved to be able to wakeup in a more appropriate order.
Needs testing, but hopefully it works out since if you do run into
starvation, then you'll go to the back of the queue for the next attempt.
Yeah, once I'll hunt down that regression with old disk, I can have a look
into how writeback throttling plays together with blkio-controller.