Re: blk-throttle.c : When limit is changed, must start a new slice

From: Vivek Goyal
Date: Mon Mar 07 2011 - 11:07:33 EST


On Mon, Mar 07, 2011 at 11:05:49PM +0800, lina wrote:
> >On Sat, Mar 05, 2011 at 12:30:55AM +0800, lina wrote:
> >> Hi Vivek,
> >> Take you a litter time to read this letter, I think there seens to
> >> be a small bug in blk-throttle.c.
> >> This might happen that initially cgroup limit was greater than the
> >> device's physic ability, but later limit was under physic ability. In this
> >> case, all of the bio in the device was throttled for serveral minutes.
> >>
> >> I did some analysis as the following:
> >> First, setting a very large bps on device lead all of the bio go
> >> through. The slice is begin when test begin, and only extend but
> >> can not start new one.
> >> Second, change the limit under physic ability to make one bio
> >> queued. Once one bio queued, blk_throtl_bio func will call
> >> tg_update_disptime to estimate the delay time for the throtl_work.
> >> As the slice is very old, there is a very large value in
> >> tg->bytes_disp[rw], and the tg->disptime is a long time after jiffies.
> >> During this time, all of the bio is queued. And the work_queue can
> >> not start, so tg->slice_start[rw] still can not be reset.
> >>
> >> Although after serveral minutes everything will be ok, but it still
> >> seens no-good for users.
> >>
> >> I think it should start a new slice when the limit is changed.
> >> Here is my patch, please conrrect it if there is something wrong
> >> follow.</:includetail>
> >
> >
> >CCing to lkml. Lets keep all the testing and bug reports regarding
> >blkio throttling on mailing list.
> >
> >thanks for the bug report lina. I think this is a bug. I am not too keen
> >on restarting slice all the time upon limit change as somebody can exploit
> >that to get higher BW by doing it frequently. Can you try attached patch
> >and see if it solves your problem.
> </:includetail></:includetail>
> Thank you for this patch, it can solve the problem. But there still has 5~10</:includetail>
> seconds 0 bps in the test. I think this is because we first tirm the end of </:includetail>
> slice, then new one, there is some latency. Do you have any idea to let </:includetail>the </:includetail>
> limit change work immediately or make less latency(maybe 1 or 2 seconds)?</:includetail>
> </:includetail>

Ok, if you are concerned about those few seconds, can you please try
following patch. I think starting a new slice is better when processing
limit change instead of in blk_throtl_bio().

Thanks
Vivek


---
block/blk-throttle.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

Index: linux-2.6/block/blk-throttle.c
===================================================================
--- linux-2.6.orig/block/blk-throttle.c 2011-03-04 13:59:45.000000000 -0500
+++ linux-2.6/block/blk-throttle.c 2011-03-07 10:54:30.639467538 -0500
@@ -757,6 +757,14 @@ static void throtl_process_limit_change(
" riops=%u wiops=%u", tg->bps[READ],
tg->bps[WRITE], tg->iops[READ],
tg->iops[WRITE]);
+ /*
+ * Restart the slices for both READ and WRITES. It
+ * might happen that a group's limit are dropped
+ * suddenly and we don't want to account recently
+ * dispatched IO with new low rate
+ */
+ throtl_start_new_slice(td, tg, 0);
+ throtl_start_new_slice(td, tg, 1);
tg_update_disptime(td, tg);
tg->limits_changed = false;
}
@@ -1023,6 +1031,19 @@ int blk_throtl_bio(struct request_queue
/* Bio is with-in rate limit of group */
if (tg_may_dispatch(td, tg, bio, NULL)) {
throtl_charge_bio(tg, bio);
+
+ /*
+ * We need to trim slice even when bios are not being queued
+ * otherwise it might happen that a bio is not queued for
+ * a long time and slice keeps on extending and trim is not
+ * called for a long time. Now if limits are reduced suddenly
+ * we take into account all the IO dispatched so far at new
+ * low rate and * newly queued IO gets a really long dispatch
+ * time.
+ *
+ * So keep on trimming slice even if bio is not queued.
+ */
+ throtl_trim_slice(td, tg, rw);
goto out;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/