Re: [PATCH 7/8] wbt: add general throttling mechanism

From: Jens Axboe
Date: Tue May 03 2016 - 14:14:41 EST

On 05/03/2016 10:59 AM, Jens Axboe wrote:
On 05/03/2016 09:48 AM, Jan Kara wrote:
On Tue 03-05-16 17:40:32, Jan Kara wrote:
On Tue 03-05-16 11:34:10, Jan Kara wrote:
Yeah, once I'll hunt down that regression with old disk, I can have
a look
into how writeback throttling plays together with blkio-controller.

So I've tried the following script (note that you need cgroup v2 for
writeback IO to be throttled):

mkdir /sys/fs/cgroup/group1
echo 1000 >/sys/fs/cgroup/group1/io.weight
dd if=/dev/zero of=/mnt/file1 bs=1M count=10000&
echo $DD1 >/sys/fs/cgroup/group1/cgroup.procs

mkdir /sys/fs/cgroup/group2
echo 100 >/sys/fs/cgroup/group2/io.weight
#echo "259:65536 wbps=5000000" >/sys/fs/cgroup/group2/io.max
echo "259:65536 wbps=max" >/sys/fs/cgroup/group2/io.max
dd if=/dev/zero of=/mnt/file2 bs=1M count=10000&
echo $DD2 >/sys/fs/cgroup/group2/cgroup.procs

while true; do
sleep 1
kill -USR1 $DD1
kill -USR1 $DD2
echo '======================================================='

and watched the progress of the dd processes in different cgroups.
The 1/10
weight difference has no effect with your writeback patches - the
after one minute:

3120+1 records in
3120+1 records out
3272392704 bytes (3.3 GB) copied, 63.7119 s, 51.4 MB/s
3217+1 records in
3217+1 records out
3374010368 bytes (3.4 GB) copied, 63.5819 s, 53.1 MB/s

I should add that even without your patches the progress doesn't quite
correspond to the weight ratio:

Forgot to fill in corresponding data for unpatched kernel here:

5962+2 records in
5962+2 records out
6252281856 bytes (6.3 GB) copied, 64.1719 s, 97.4 MB/s
1502+0 records in
1502+0 records out
1574961152 bytes (1.6 GB) copied, 64.207 s, 24.5 MB/s

Thanks for testing this, I'll see what we can do about that. It stands
to reason that we'll throttle a heavier writer more, statistically. But
I'm assuming this above test was run basically with just the writes
going, so no real competition? And hence we end up throttling them
equally much, destroying the weighting in the process. But for both
cases, we basically don't pay any attention to cgroup weights.

but still there is noticeable difference to cgroups with different

OTOH blk-throttle combines well with your patches: Limiting one
cgroup to
5 M/s results in numbers like:

3883+2 records in
3883+2 records out
4072091648 bytes (4.1 GB) copied, 36.6713 s, 111 MB/s
413+0 records in
413+0 records out
433061888 bytes (433 MB) copied, 36.8939 s, 11.7 MB/s

which is fine and comparable with unpatched kernel. Higher throughput
number is because we do buffered writes and dd reports what it wrote
page cache. And there is no wonder blk-throttle combines fine - it
throttles bios which happens before we reach writeback throttling

OK, that's good, at least that part works fine. And yes, the throttle
path is hit before we end up in the make_request_fn, which is where wbt
drops in.

So I belive this demonstrates that your writeback throttling just
work well with selective scheduling policy that happens below it
because it
can essentially lead to IO priority inversion issues...

It this testing still done on the QD=1 ATA disk? Not too surprising that
this falls apart, since we have very little room to maneuver. I wonder
if a normal SATA with NCQ would behave better in this regard. I'll have
to test a bit and think about how we can best handle this case.

I think what we'll do for now is just disable wbt IFF we have a non-root cgroup attached to CFQ. Done here:

We don't have a strong need for wbt (supposedly) since CFQ should take care of most of it, if you have policies set for proportional sharing.

Longer term it's not a concern either, as we'll move away from that model anyway.

Jens Axboe