Re: [PATCH v2] blk-throttle: simplify logic by token bucket algorithm

From: Vivek Goyal
Date: Tue Oct 15 2013 - 13:33:20 EST


On Mon, Oct 14, 2013 at 05:09:17PM +0800, Hong Zhiguo wrote:

[..]

Hi Hong,

Thanks for the token based throttling implementation. In general it looks
good and it simplies the logic. Couple of comments/concerns below.

> @@ -133,14 +135,13 @@ struct throtl_grp {
> /* IOPS limits */
> unsigned int iops[2];
>
> - /* Number of bytes disptached in current slice */
> - uint64_t bytes_disp[2];
> - /* Number of bio's dispatched in current slice */
> - unsigned int io_disp[2];
> + /* Token for throttling by bps */
> + uint64_t bytes_token[2];
> + /* Token for throttling by iops */
> + unsigned int io_token[2];
>
> - /* When did we start a new slice */
> - unsigned long slice_start[2];
> - unsigned long slice_end[2];
> + /* Time check-point */
> + unsigned long t_c[2];

Can we rename this variable to say last_dispatch[2].

[..]
>
> @@ -852,41 +708,26 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
> unsigned long *wait)
> {
> bool rw = bio_data_dir(bio);
> - u64 bytes_allowed, extra_bytes, tmp;
> - unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
> -
> - jiffy_elapsed = jiffy_elapsed_rnd = jiffies - tg->slice_start[rw];
> -
> - /* Slice has just started. Consider one slice interval */
> - if (!jiffy_elapsed)
> - jiffy_elapsed_rnd = throtl_slice;
> -
> - jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, throtl_slice);
> + u64 extra_bytes, token;
> + unsigned long jiffy_wait;
>
> - tmp = tg->bps[rw] * jiffy_elapsed_rnd;
> - do_div(tmp, HZ);
> - bytes_allowed = tmp;
> + token = (u64)tg->bps[rw] * (jiffies - tg->t_c[rw]);

So here we seem to be calculating how many tokens a group is eligible
for. This is done based on since last time check. And last time check
was updated in last dispatch from the group.

What happens if no IO happens in a group for some time, say for 5 minutes,
and then one big IO comes in. IIUC, we will allow a big burst of IO for
the first IO (tokens worth of full 5 minutes will be given to this group).
I think we should have a mechanism where we don't allow too big a burst
for the first IO. (Possibly limit the burst to THROTL_BURST_IO and
THROTL_BURST_BYTES until and unless a bigger IO is already queued and
we are waiting for it).

In previous time slice logic, I took care of it by extend time slice
by 100ms and if no IO happens in the group for 100ms, time slice will
expire and next IO will get at max of 100ms of worth of credit (equivalent
of burst limits).

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/